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Abstract 

A  common  complaint  of  hearing-aid  users  is  the  difiiculty  encoimtered  when  hstening 
to  a  talker  in  a  noisy  environment.  Conventional  hearing  aids  amphfy  all  sounds 
without  discriminating  between  the  desired  source  (target)  and  background  noises 
(jammers).  These  devices  increase  the  overall  sound  levels,  but  do  nothing  to  im¬ 
prove  target-to-jammer  ratio  (TJR).  Research  on  microphone-array  hearing  aids  is 
motivated  by  the  lack  of  success  of  single-microphone  systems,  as  well  as  the  docu¬ 
mented  advantages  of  binaural  hearing  and  multiple-element  sensing  systems. 

Array  processing  can  be  classified  as  either  fixed  (time  invariant)  or  adaptive  (time 
var3dng).  Previous  work  on  microphone-array  hearing  aids  has  demonstrated  that  un¬ 
der  certain  conditions,  adaptive  arrays  can  provide  significantly  better  performance 
than  simpler  fixed  arrays.  The  benefit  of  adaptive  systems  is  realized  when  the  in¬ 
put  TJR  is  low  and  when  the  signals  arriving  via  direct  paths  are  stronger  than  the 
reflections.  This  benefit  is  reduced  or  eliminated  at  high  TJR  or  in  strong  reverbera¬ 
tion.  This  work  studies  modified  adaptive  algorithms  to  improve  performance  at  high 
TJR  and  in  reverberation;  it  also  provides  complete  specifications  for  the  design  of 
an  adaptive  microphone-array  hearing  aid. 

In  particular,  two  previously  proposed  ad  hoc  methods  for  controlling  adaptation 
at  high  TJR  are  analyzed  and  evaluated.  The  results  confirm  the  usefulness  of  these 
methods  and  provide  guidelines  for  selecting  relevant  parameters  in  anechoic  and 
reverberant  environments.  In  addition,  an  analysis  of  the  specific  causes  of  target 
cancellation  in  reverberation  reveals  that  a  simple  set  of  parameter  choices  can  solve 
this  problem. 

Computer  simulations  of  the  complete  system  demonstrate  its  benefits  in  a  variety 
of  acoustic  environments.  Steady-state  results  show  that  the  system  provides  very 
large  improvements  in  relatively  anechoic  environments.  Substantial  benefits  are  pro¬ 
vided  in  moderate  reverberation,  particularly  if  relatively  long  filters  (~  100  ms)  are 
used.  In  extreme  reverberation,  performance  is  comparable  to  that  obtained  with  the 
underlying  non-adaptive  microphone  array.  Transient  results  indicate  that  conver¬ 
gence  is  sufficiently  rapid  for  processing  speech  signals.  The  number  of  microphones 
required  in  a  practical  system  and  the  use  of  directional  microphones  are  discussed. 

Thesis  Supervisor:  Patrick  M.  Zurek 

Title:  Principal  Research  Scientist,  Research  Laboratory  of  Electronics 
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Chapter  1 


Introduction 

A  common  complaint  of  hearing-aid  users  is  the  difficulty  encountered  in  Hstening 
to  talkers  in  noisy  environments  (Plomp,  1978;  Smedley  and  Schow,  1990).  Con¬ 
ventional  hearing  aids  amplify  aU  sounds  without  discriminating  between  the  desired 
source  (target)  and  background  noises  (jammers).  These  devices  increase  the  over¬ 
all  sound  levels,  but  do  nothing  to  improve  target-to-jammer  ratio  (TJR).  A  variety 
of  techniques  have  been  investigated  for  single-naicrophone  speech  enhancement,  but 
none  of  these  techniques  has  improved  speech  intelligibility  in  the  presence  of  broad¬ 
band  jammers  such  as  competing  speech  (Lim  and  Oppenheim,  1979;  Weiss  and 
Neuman,  1993;  Dillon  and  Lovegrove,  1993).  The  lack  of  success  of  single  microphone 
systems,  together  with  the  documented  advantages  of  binaural  hearing  and,  more 
generally,  multiple-element  sensing  systems,  has  led  to  substantial  research  interest 
in  microphone- array  hearing  aids. 

The  ideal  hearing  aid  is  one  that  replaces  the  functions  of  normal  binaural  hearing, 
providing  a  signal  or  signals  that  allow  the  listener  to  focus  on  one  source  while  si¬ 
multaneously  monitoring  other  directions  (Durlach  and  Colburn,  1978).  An  artificial 
system  to  replace  binaural  capabilities  could  be  composed  of  two  stages.  The  first 
stage  would  decompose  the  acoustic  environment  into  directional  channels,  each  con¬ 
taining  an  isolated  signal  emanating  from  a  particular  direction.  The  second  stage 
then  consists  of  a  coding  scheme  that  woidd  allow  the  user  to  focus  on  any  single 
channel  while  simultaneously  monitoring  all  other  channels. 
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A  necessary  component  of  the  ideal  hearing  aid,  and  one  that  would  be  useful  in 
its  own  right,  is  a  system  that  maximizes  the  target-to-jammer  ratio  (TJR)  assuming 
a  known  target  direction.  Even  in  the  absence  of  schemes  for  separating  and  coding 
multiple  directional  sources,  this  component  of  the  first  stage  could  be  incorporated 
in  a  system  with  user  controls  for  steering  to  a  selected  target  direction,  or  in  which 
the  target  direction  is  fixed. 

Previous  research  has  demonstrated  that  adaptive  microphone-array  systems  have 
potential  as  an  effective  way  to  perform  this  signal  extraction  (Greenberg  and  Zurek, 
1992).  However,  several  pressing  problems  remain  that  must  be  addressed  before 
microphone  arrays  can  perform  successfully  in  a  variety  of  acoustic  environments. 
This  thesis  proposes  solutions  to  those  problems  and  demonstrates  their  effectiveness 
with  computer  simulations.  The  goal  of  this  work  is  the  development  of  a  practical 
system  for  microphone-array  hearing  aids. 

Although  the  focus  of  this  thesis  is  the  improvement  of  conventional  hearing  aids, 
the  results  are  applicable  to  other  aids  such  as  cochlear  implants.  More  generally, 
the  system  described  in  this  thesis  may  be  of  use  in  any  situation  where  reduction 
of  interference  from  spatially-separated  sound  sources  is  required.  Examples  of  such 
situations  include  general  microphone  systems,  hands-free  telephones,  teleconference 
systems,  and  automatic  speech  recognition  devices. 

The  remainder  of  this  thesis  is  organized  as  follows.  Chapter  2  reviews  relevant 
signal  processing  concepts  and  previous  work  on  microphone- array  hearing  aids.  It 
also  contains  a  summary  of  problems  identified  by  previous  work  and  motivations 
for  the  solutions  investigated  in  the  following  chapters.  Chapter  3  describes  methods 
common  to  several  aspects  of  the  current  work,  including  source  materials,  simulated 
rooms,  and  the  performance  metric.  Chapters  4-6  each  address  a  particular  issue  with 
implications  for  designing  microphone- array  hearing  aids.  The  results  of  these  three 
chapters  specify  a  modified  adaptive  algorithm  that  is  subsequently  implemented 
in  computer  simulations.  The  results  of  those  simulations  are  presented  in  Ch.  7, 
illustrating  the  performance  that  can  be  obtained  with  adaptive  microphone-array 
systems  in  a  variety  of  acoustic  environments.  Chapter  8  contains  a  discussion  that 
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includes  recommendations  for  future  work,  and  Ck.  9  consists  of  a  summary  and 
conclusions. 
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Chapter  2 


Background 


2.1  Array  Processing 

Tliis  section  provides  a  brief  description  of  signal  processing  concepts  and  terminology 
relevant  to  this  work.  A  thorough  presentation  of  array  processing,  beamforming,  and 
adaptive  signal  processing  can  be  foimd  in  the  extensive  literature  available  on  these 
subjects  (e.g..  Van  Veen  and  Buckley,  1988;  Widrow  and  Steams,  1985;  Monzingo 
and  Miller,  1980;  Haykin,  1985;  Haykin,  1986;  Johnson  and  Dudgeon,  1993). 

The  bidk  of  research  concerning  the  design  and  analysis  of  array  processors  has 
been  for  applications  in  radar,  sonar,  and  geophysics.  Although  the  basic  princi¬ 
ples  and  some  algorithms  from  these  fields  are  applicable  to  the  hearing-aid  problem, 
several  significant  differences  exist.  First,  in  hearing  aids  the  signals  are  speech,  a 
broadband  signal,  while  much  of  the  array  processing  literature  is  restricted  to  the 
somewhat  simpler  narrowband  case.  Second,  assuming  that  cosmetic  considerations 
limit  the  design  to  head-sized  arrays,  for  the  hearing-aid  problem  the  spatial  aper¬ 
ture  win  be  small  relative  to  the  wavelengths  of  interest.  Furthermore,  whereas  in 
some  fields  the  concept  of  multipath  refers  to  a  small  number  of  reflections  with  sub¬ 
stantially  less  energy  than  the  direct  signal,  in  a  typical  room  the  reverberant  sound 
arrives  from  countless  directions  and  may  have  significantly  more  energy  than  the 
direct  sound.  And  finally,  although  this  work  does  not  directly  address  the  issues 
of  implementing  an  algorithm  in  a  practical  hearing  aid,  necessary  restrictions  on 
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processor  size  and  power  consumption  will  ultimately  impose  severe  limitations  on 
computational  complexity  in  a  wearable  device. 

Recent  literature  does  contain  some  applications  of  microphone  arrays  to  speech 
processing.  In  addition  to  hearing  aids,  these  applications  include  hands-free  tele¬ 
phony  (Goulding  and  Bird,  1990;  Claesson  et  al.,  1991;  Grenier,  1993),  preprocessing 
for  speech  recognition  (Van  CompernoUe  et  al.,  1991;  Parry,  1990),  cockpit  communi¬ 
cation  systems  (Harrison  et  al.,  1986),  and  general  microphone  systems  (Kaneda  and 
Ohga,  1986;  Lu  and  Clarkson,  1993;  Flanagan  et  al.,  1991). 

Array  processors  can  be  classified  as  either  fixed  or  adaptive.  Fixed  or  data- 
independent  processing  applies  fixed  filters  to  each  microphone  signal  and  sums  the 
results  to  produce  a  single  output.  The  weights  are  typically  selected  to  optimize 
a  quantity  such  as  directivity  (the  array’s  response  to  a  signal  from  straight  ahead 
relative  to  its  diffuse-field  response).  On  the  other  hand,  adaptive  processors  uti¬ 
lize  time-varying  filters  that  are  adjusted  to  approach  a  statistical  optimum  (in  a 
least-squares  sense)  while  tracking  changes  in  the  environment.  Adaptive  processing 
usually  requires  more  intensive  computation  than  fixed  processing,  but  may  provide 
better  performance  against  directional  and  time-varying  jammers.  The  advantage  of 
adaptive  processing  is  realized  if  the  underlying  optimum  processor  outperforms  a 
fixed  processor  with  an  equal  number  of  sensors  and  if  the  non-stationarity  in  the 
environment  is  slow  relative  to  the  time  required  for  the  adaptive  algorithm  to  con¬ 
verge. 

The  choice  of  the  optimality  criterion  for  an  adaptive  system  is  dictated  by  the 
information  assumed  to  be  available  about  the  signals  and  the  environment.  Peter¬ 
son  (1989)  showed  that  many  different  optimum  processors  (minimum  mean-square 
error,  maximum  a  posteriori  probability,  maximum  hkelihood,  and  minimum  vari¬ 
ance)  are  identical  to  within  a  scalar  function  of  frequency  dependent  only  on  the 
assumed  a  priori  knowledge  of  the  target  and  jammer  spectra.  Two  adaptive  systems 
often  considered  for  use  in  the  hearing  aid  application  are  the  adaptive  noise  can¬ 
celler  (Widrow,  et  al.,  1975)  and  the  linearly  constrained  minimum  variance  (LCMV) 
beamformer  (Frost,  1972).  The  adaptive  noise  canceller  (ANC)  requires  a  reference 
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signal  related  to  the  jammer,  but  free  of  target.  The  LCMV  beamformer  requires 
that  the  target  direction  is  known  and  that  the  target  signal  is  uncorrelated  with  all 
jammer  signals. 

A  block  diagram  of  the  adaptive  noise  canceller  is  shown  in  Figure  2.1.  This  system 
requires  two  inputs,  a  primary  signal  that  contains  target  plus  jammer  and  a  reference 
signal  that  ideally  contains  a  filtered  version  of  the  jammer  only.  An  adaptive  filter 
then  operates  on  the  reference  input  using  the  LMS  algorithm  (Widrow  and  Steams, 
1985)  to  minimize  the  total  output  power  of  the  system.  If  the  reference  contains  no 
target  and  if  the  target  and  jammer  are  uncorrelated,  minimizing  the  total  output 
power  is  equivalent  to  minimizing  the  jammer  power.  For  the  hearing-aid  application, 
it  is  usually  not  possible  to  obtain  a  reference  signal  that  is  perfectly  free  of  target. 
Instead,  the  reference  signal  is  obtained  from  either  a  directional  microphone  pointed 
away  from  the  target  or  a  remote  microphone  placed  close  to  the  noise  source.  Any 
‘leakage’  of  the  target  into  the  reference  channel  can  lead  to  cancellation  of  the  target, 
a  situation  to  be  avoided.  Although  the  results  of  previous  work  with  adaptive  noise 
cancellers  can  provide  insight  into  issues  relevant  to  the  design  of  microphone-array 
hearing  aids,  this  work  will  only  consider  systems  that  do  not  require  a  ‘target-free’ 
reference  signal. 

The  LCMV  beamformer  assumes  that  the  direction  of  the  target  signal  is  known, 
and  that  the  target  and  jammer  signals  are  uncorrelated.  The  weights  are  adjusted  to 
minimize  output  power  subject  to  constraints  that  apply  a  specified  filter  to  the  signal 
arriving  from  the  target  direction.  The  two  basic  structures  used  to  implement  LCMV 
beamforming  are  the  linearly  constrained  adaptive  array  processor  (Frost,  1972)  and 
the  generalized  sidelobe  canceller  (Griffiths  and  Jim,  1982),  shown  in  Figs.  2.2  and 
2.3.  For  simplicity,  the  implementations  in  the  figures  do  not  show  the  initial  stage 
of  time-delay  steering  required  to  align  the  array  to  the  target  signal. 

The  structure  used  to  implement  Frost’s  linearly  constrained  adaptive  array  pro¬ 
cessor  consists  of  a  tapped  delay  line  for  each  microphone  signal.  There  is  a  single 
adaptive  weight  associated  with  each  tap,  and  the  output  consists  of  the  sum  of  all 
weighted  tap  values.  The  adaptive  weights  are  updated  by  an  iterative  constrained 
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Figure  2-1:  Block  diagram  of  adaptive  noise  canceller.  The  two  inputs  are  a  primary 
signal,  containing  filtered  versions  of  the  target  and  jammer  sources,  and  a  reference 
signal,  containing  a  different  filtered  version  of  the  jammer.  The  reference  signal  is 
the  input  to  an  L-point  adaptive  filter.  The  adaptive  noise  canceller  output,  2/[n], 
is  the  difference  between  the  delayed  primary  signal  and  the  output  of  the  adaptive 
filter.  The  adaptive  weights,  u;fc[n]  for  k  =  0, •••  ,i/  —  1,  are  adjusted  to  minimize 
the  total  output  power,  which,  under  ideal  conditions,  preserves  the  target  signal  and 
minimizes  the  jammer  output  power. 
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linearly  constrained 
adaptive  algorithm 


Figure  2-2:  Block  diagram  of  linearly  constrained  adaptive  array  processor.  Each 
microphone  signal  is  processed  by  a  tapped  delay  line  with  adaptive  weights  updated 
by  an  iterative  constrained  minimization  algorithm. 
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unconstrained  adaptive  algorithm 


Figure  2-3:  Block  diagram  of  generalized  sidelobe  canceller.  The  upper  channel  im¬ 
poses  fixed  constraints,  while  the  lower  channel  consists  of  a  blocking  matrix  that 
removes  the  target  signal,  followed  by  adaptive  filters  that  perform  unconstrained 
minimization  on  the  remaining  signals. 
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minimization  algorithm.  The  constraints  are  selected  to  provide  the  desired  response 
for  signals  arriving  from  the  target  direction  (typically  unity  gain). 

The  generalized  sidelobe  canceller  proposed  by  Griffiths  and  Jim  (1982)  consists 
of  two  substructures  that  together  act  to  minimize  the  total  output  power  subject 
to  the  constraints.  The  upper  channel  forms  a  weighted  sum  of  the  sensor  signals 
(essentially  a  fixed  array  processor),  and  then  processes  this  sum  by  an  FIR  filter 
that  imposes  the  desired  filtering  described  by  the  constraints  (again,  typically  unity 
gain).  The  lower  channel  consists  of  a  blocking  matrix  that  combines  the  sensor 
signals  so  as  to  remove  the  target  signal,  followed  by  an  adaptive  algorithm  that 
performs  unconstrained  minimization  on  the  remaining  signals.  Equivalence  of  the 
Frost  and  Griffiths- Jim  processors  can  be  shown  for  a  variety  of  conditions  (Griffiths 
and  Jim,  1982). 

Figure  2.4  shows  a  simple  and  useful  form  of  the  generalized  sidelobe  canceller, 
again  assuming  that  the  target  signal  was  previously  equalized  across  microphones. 
In  this  case,  the  constraints  consist  of  averaging  the  M  microphone  signals  and  then 
delaying  that  primary  signal  by  D  samples.  The  purpose  of  the  delay  is  to  permit 
the  adaptive  filter  in  the  lower  channel  to  form  non-causal  responses  (Widrow  and 
Stearns,  1985).  The  blocking  matrix  consists  of  taking  the  difference  between  pairs  of 
microphone  signals  to  produce  M  —  1  target-free  reference  signals.  For  any  combina¬ 
tion  of  pairs  selected  so  that  the  blocking  matrix  has  full  rank,  the  optimal  solution 
for  the  adaptive  weights  wiU  be  identical. 

There  is  a  wide  variety  of  adaptive  algorithms  available  for  implementing  the 
unconstrained  minimization  required  by  the  generalized  sidelobe  canceller.  The  LMS 
algorithm  is  often  used  because  of  its  simplicity.  When  the  LMS  algorithm  is  used  in 
conjunction  with  a  two- microphone  version  of  the  generalized  sidelobe  canceller  shown 
in  Fig.  2.4,  the  system  is  equivalent  to  a  preprocessor  consisting  of  talcing  the  sum 
and  difference  of  the  microphone  signals,  followed  by  an  adaptive  noise  canceller.  In 
any  case,  steering  errors  or  imperfections  in  the  blocking  matrix  can  cause  leakage  of 
the  target  signal  into  the  reference  signal,  resulting  in  target  cancellation  as  described 
above  in  conjunction  with  the  adaptive  noise  canceller. 
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Figure  2-4:  Block  diagram  of  a  simple  generalized  sidelobe  canceller  with  M  micro¬ 
phones.  The  fixed  constraints  in  the  upper  channel  preserve  the  target  by  averaging 
the  microphone  signals  and  delaying  the  restdt.  The  blocking  matrix  is  implemented 
in  the  lower  chaimel  by  taking  the  difference  between  pairs  of  microphone  signals. 


17 


2.2  Microphone-array  hearing  aids 

2.2.1  Fixed  array  processors 


Previous  work  has  established  the  potential  benefits  of  fixed  arrays  for  the  hearing- 
aid  application  (Peterson,  1989;  Soede  et  al.,  1993a, b;  Stadler  and  Rabinowitz,  1993; 
Kates,  1993).  Peterson  (1989)  considered  the  perfornaance  of  both  fixed  and  adaptive 
arrays;  that  work  is  discussed  in  Sec.  2.2.2. 

Soede  et  al.  (1993a,b)  designed,  constructed,  and  evaluated  fixed  arrays  for  use 
as  hearing  aids.  They  considered  linear  arrays  consisting  of  five  evenly- spaced  cardioid 
microphones  mounted  on  eyeglass  frames  in  both  broadside  and  endfire  configurations.^ 
Physical  measurements  showed  that  these  arrays  provide  gains  in  signal-to-noise  ratio 
of  6-7  dB  in  diffuse  noise.  Intelligibility  tests  with  hearing-impaired  listeners  showed 
improvements  of  7  dB  in  the  speech  reception  threshold. 

Stadler  and  Rabinowitz  (1993)  also  considered  hnear  broadside  and  endfire  ar¬ 
rays.  They  appHed  sensitivity-constrained  optimum  beamforming  (Cox  et  al.,  1986) 
to  fixed  arrays  with  directional  microphone  elements,  providing  a  design  method  that 
controls  the  tradeoff  between  directionality  and  noise  sensitivity.  They  computed  the 
theoretical  performance  of  these  arrays  in  free  space  for  various  numbers  and  types 
of  microphones.  Their  results  predict  gains  comparable  to  those  seen  by  Soede  et 
al.  (1993a,b)  for  the  same  array  configurations.  For  endfire  arrays,  their  results  show 
that  using  frequency-dependent  weights  with  four  or  five  microphones  provides  di¬ 
rectivities  of  8-10  dB,  regardless  of  the  type  of  microphone.  For  broadside  arrays, 
directional  microphones  provide  a  clear  advantage,  but  there  is  little  advantage  to 
using  frequency-dependent  weights.  A  broadside  array  of  two  or  five  cardioid  or  su- 
percardioid  elements  with  uniform  weights  (simply  averaging  the  microphone  signals) 
provides  directivities  of  7-8  dB.  Although  the  directivity  of  these  broadside  arrays  is 
roughly  constant  for  2-5  microphones,  increasing  the  number  of  microphones  reduces 
the  noise  sensitivity,  making  the  system  more  robust. 

^Microphones  in  a  broadside  array  form  a  line  perpendicular  to  the  target  direction,  while  mi¬ 
crophones  in  an  endfire  array  are  colinear  with  the  target  direction. 
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2.2.2  Adaptive  array  processors 

A  number  of  researchers  have  considered  adaptive  microphone-array  systems  for  ap¬ 
plication  to  hearing  aids  (Weiss,  1987;  Schwander  and  Levitt,  1987;  Chabries  et  al., 
1987;  Brey  et  al.,  1987;  Cha.zan  et  al.,  1987;  Peterson,  1989;  Peterson  et  al.,  1990; 
Van  Compemolle,  1990a;  Farassopoulos,  1992;  Greenberg  and  Zurek,  1992;  DiDier 
et  al.,  1993;  Kohlmeier  et  al.,  1993;  Hoffman  et  al.,  1994;  Link,  1994).  Unlike  fixed 
arrays,  there  are  a  variety  of  ways  to  design,  implement,  and  evaluate  these  adaptive 
processors,  which  hinders  comparisons  among  the  different  studies.  Some  of  these 
systems  are  based  on  the  adaptive  noise  canceller  and  require  a  target-free  reference 
signal,  and  many  are  restricted  to  two-microphone  arrays.  In  general,  the  results  of 
these  studies  indicate  that  adaptive  microphone  array  systems  provide  substantial 
benefits  under  certain  conditions. 

Peterson  (1989)  calculated  the  optimum  performance  of  LCMV  beamformers  in 
the  presence  of  directional  and  isotropic  noise  for  head-sized  free-space  arrays  based 
on  unlimited  filter  length.  He  considered  how  performance  vaxies  with  the  number  of 
sensors,  internal  sensor  noise,  array  dimension,  and  array  orientation.  His  results  show 
that  in  general,  the  performance  of  arrays  designed  to  provide  equal  noise  sensitivity 
increases  with  the  number  of  microphones,  but,  for  head-sized  arrays  and  realistic 
levels  of  sensor  noise,  performance  saturates  and  the  improvement  is  neghgible  beyond 
4-6  microphones.  Once  the  number  of  microphones  exceeds  the  number  of  directional 
jammers,  little  or  no  additional  benefit  is  obtained  from  adding  more  microphones. 
Performance  also  increases  with  array  length,  except  for  arrays  with  a  small  number 
of  microphones  where  spatial  undersampling  occurs  in  the  long  arrays. 

Most  previous  studies  have  considered  relatively  simple  adaptive  systems  based 
on  two  microphones.  The  performance  of  two-microphone  systems  decreases  dramat¬ 
ically  when  a  second  jammer  source  is  introduced  (Weiss,  1987;  Peterson  et  al.,  1990). 
In  theory,  a  system  with  M  microphones  can  create  M  —  1  independent  broadband 
nuUs,  and  therefore  can  effectively  cancel  M  —  1  independent  directional  jammers. 
As  a  result,  a  two-microphone  array  is  only  expected  to  perform  well  against  a  single 
directional  jammer. 
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Initial  assessments  have  demonstrated  the  potential  benefits  of  adaptive  systems 
based  on  the  generalized  sidelobe  canceller.  (Peterson  et  al.,  1990;  Greenberg  and 
Zurek,  1992).  These  studies  have  shown  that  adaptive  systems  operating  in  anechoic 
environments  can  provide  20-30  dB  improvement  based  on  both  physical  measure¬ 
ments  and  intelligibility  tests  with  normal-hearing  hsteners.  Improvements  of  3-15 
dB  have  been  reported  for  a  variety  of  moderately  reverberant  environments. 


2.3  Problems  and  proposed  solutions 

Previous  studies  have  identified  a  number  of  problems  with  adaptive  array  systems, 
and  some  have  proposed  solutions  to  those  problems.  The  problems  include  misad- 
justment  of  the  adaptive  algorithm,  misahgnment  due  to  nonideal  conditions,  and 
problems  caused  by  reverberation. 

2.3.1  Misadjustment  and  misalignment 

Misadjustment  of  adaptive  weights  is  an  unavoidable  result  of  any  adaptive  process 
using  a  stochastic  gradient  search  such  as  the  LMS  algorithm.  The  misadjustment 
is  defined  as  the  ratio  of  mean-squared  error  caused  by  the  adaptive  process  to  the 
minimum  mean-squared  error  produced  by  the  optimal  filter  (Widrow  and  Stearns, 
1985).  Because  the  adaptive  weights  are  driven  by  the  output  of  the  system,  when  a 
strong  target  signal  is  present  there  are  large  steps  in  the  weight  update  uncorrelated 
with  the  jammer  being  cancelled.  This  leads  to  reduced  jammer  cancellation  at  high 
TJRs.  The  effects  of  misadjustment  are  described  in  Peterson  et  al.  (1990)  and 
Greenberg  and  Zurek  (1992). 

Another  degrading  effect  is  caused  by  misalignment  of  the  array  to  the  target 
source.  If  the  array  is  not  perfectly  aligned  to  the  target,  then  some  target  signal 
wiU  leak  through  the  constraints  or  the  blocking  matrix  and  can  subsequently  be 
cancelled  by  the  adaptive  process.  Even  if  the  target  direction  is  known  exactly  and 
the  array  is  perfectly  steered,  mismatched  sensors  and  errors  in  sensor  placement  will 
cause  misalignment.  The  importance  of  target  leakage  was  described  by  Widrow  et 
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al.  (1975)  who  showed  that,  for  an  unconstrained  adaptive  filter  (one  whose  impulse 
response  can  extend  infinitely  in  both  time  directions),  the  target-to-jammer  ratio 
at  the  output  equals  the  jammer-to-target  ratio  at  the  reference  input.  With  any 
fixed,  non-zero  transfer  function  in  the  leakage  path,  the  problem  clearly  worsens  as 
the  input  TJR  increases,  leading  to  more  target  cancellation  at  higher  TJRs.  This 
degradation  caused  by  target  leakage  is  seen  with  TJRs  as  low  as  0  dB,  and  is  clearly 
detrimental  at  10-20  dB  (Peterson  et  al.,  1990;  Greenberg  and  Zurek,  1992). 

The  degradations  due  to  misadjustment  and  misalignment  are  both  proportional 
to  the  TJR.  Therefore,  a  general  solution  to  these  problems  is  based  on  controlling 
the  adaptive  process  so  that  adaptation  occurs  only  when  the  target  signal  is  weak  or 
absent.  Greenberg  and  Zurek  (1992)  accomplish  this  with  two  methods  for  controlling 
adaptation  at  high  TJR.  Both  methods  exploit  the  fact  that  the  target  signal  in  this 
application  -  speech  -  exhibits  a  high  degree  of  fluctuation,  and,  in  fact,  has  pause 
periods  when  the  target  is  absent.  Both  attempt  to  sense  the  TJR  and  to  adapt  only 
in  intervals  when  the  TJR  is  small. 

The  first  method  is  only  effective  when  there  is  neghgible  leakage  of  the  target 
signal.  When  this  condition  is  met  and  the  input  TJR  is  high,  the  system  output 
power  will  be  greater  than  the  power  of  the  reference  signal.  The  LMS  weight  update 
equation  is  modified  to  normalize  the  step-size  parameter  with  the  sum  of  the  reference 
signal  and  output  signal  powers^  in  order  to  reduce  the  size  of  target-induced  weight 
fluctuations.  (This  method  is  explained  in  detail  in  Ch.  4.)  The  approach  is  similar 
to  that  taken  by  Duttweiler  (1982)  and  Jeyendran  and  Reddy  (1990). 

The  second  method  employs  intermicrophone  correlation  to  determine  the  range  of 
TJR  .  The  straight-ahead  target  contributes  a  signal  with  correlation  near  unity,  and 
off-axis  jammers  have  a  correlation  that  is  less  than  unity  and  depends  on  frequency 
and  direction.  A  running  measure  of  the  correlation  between  microphone  signals  wiU 
vary  with  TJR  and  can  be  used  as  an  indicator  of  relative  target  strength.  For  each 
cycle  of  the  adaptive  algorithm,  the  correlation  measure  is  compared  to  a  threshold 
and  the  adaptive  process  is  inhibited  (the  weights  are  frozen  at  their  current  values) 

^Traditionally,  the  step-size  parameter  is  normalized  by  the  reference  signal  power  alone. 
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if  the  correlation  exceeds  the  threshold.  Other  researchers  have  proposed  similar 
mechanisms  to  disable  the  adaptive  process  in  the  presence  of  strong  target  signals 
(Van  CompernoUe,  1990a, b;  Harrison  et  zd.,  1986;  Kaneda  and  Ohga,  1986;  Sondhi 
and  Berkley,  1980;  DiUier  et  al.,  1993);  they  use  various  methods  of  estimating  signal 
powers  to  determine  when  adaptation  should  be  disabled. 

Both  of  these  methods  for  controlling  adaptation  at  high  TJR  were  shown  to  be 
effective  with  a  two-microphone  generalized  sidelobe  canceller  in  an  anechoic  environ¬ 
ment.  Using  the  two  methods  together  significantly  improved  performance  at  high 
TJRs  and  eliminated  degrading  target  cancellation.  Although  these  two  methods  were 
shown  to  be  effective,  the  restdting  algorithms  were  not  subject  to  thorough  analy¬ 
sis  and  many  of  the  parameters  were  selected  in  an  ad  hoc  manner.  In  the  current 
work,  the  two  previously  proposed  methods  of  controlling  adaptation  are  analyzed 
individually,  formal  methods  are  developed  for  parameter  selection  in  anechoic  and 
reverberant  environments,  and  the  performance  of  these  methods  is  predicted  and 
verified  under  simphfied  conditions  before  apphcation  to  the  more  complex  situation 
of  speech  in  reverberation.  Alternative  methods  for  normalizing  the  step-size  param¬ 
eter  are  analyzed  in  Ch.  4,  and  the  use  of  intermicrophone  correlation  to  determine 
the  range  of  TJR  is  analyzed  in  Ch.  5. 

Hoffman  (1992)  proposed  an  alternate  technique  to  prevent  target  cancellation  due 
to  misalignment,  without  addressing  the  problem  of  misadjustment.  He  developed 
a  method  for  determining  linear  constraints  plus  a  quadratic  constraint  for  Frost’s 
hnearly  constrained  adaptive  array  processor  to  prevent  target  cancellation  beyond 
an  acceptable  level  (e.g.,  3  dB).  The  constraints  are  based  on  a  model  of  the  sources  of 
misalignment  that  the  system  must  accommodate,  for  example,  errors  due  to  micro¬ 
phone  locations.  He  demonstrates  arrays  of  three,  five,  and  seven  microphones  with 
8  and  16  taps  per  filter.  Simulation  results  show  gains  of  10-20  dB  in  an  anechoic 
environment  and  5-10  dB  in  moderate  reverberation. 
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2.3.2  Reverberation 


Previous  work  has  shown  that  adaptive  array  processors  provide  some  improvement 
in  reverberant  environments,  but  that  the  benefit  decreases  as  the  degree  of  reverber¬ 
ation  increases.  For  some  conditions,  performance  also  decreases  dramatically  with 
increasing  TJR.  Reverberation  hmits  adaptive  system  performance  in  two  ways.  The 
first  is  target  cancellation  resulting  from  violation  of  the  system’s  fundamental  as¬ 
sumptions.  The  second  is  reduced  jammer  cancellation  due  to  increased  complexity 
of  the  acoustic  environment. 

Target  cancellation 

As  discussed  in  Sec.  2.1,  the  LCMV  beamformer  assumes  that  the  target  direction  is 
known  and  that  target  and  jammer  are  uncorrelated.  Target  signal  reflections  vio¬ 
late  one  of  these  two  assumptions.  If  the  reflected  target  signal  is  considered  target, 
then  the  assumption  of  known  target  direction  is  violated.  On  the  other  hand,  if 
the  reflected  target  is  considered  jammer,  the  assumption  of  uncorrelated  target  and 
jammer  is  violated.  Taking  the  latter  view,  it  is  instructive  to  consider  the  optimum 
performance  of  the  LCMV  beamformer  in  the  presence  of  correlated  interference. 
Reddy  et  al.  (1987)  considered  the  case  of  a  narrowband  target  with  a  single  jammer 
that  is  either  partially  or  fully  correlated  with  the  target.  Zoltowski  (1988)  extended 
the  analysis  to  include  multiple  partially-correlated  narrowband  jammers.  Their  re¬ 
sults  provide  general  expressions  for  the  steady-state  output  power  and  quantify  the 
target  cancellation  due  to  correlated  jammers.  In  general,  as  the  correlation  increases 
between  target  and  jammer  signals,  the  LCMV  beamformer  exhibits  progressive  de¬ 
terioration  in  performance  due  to  both  diminished  jammer  rejection  and  increased 
target  cancellation. 

Some  researchers  have  proposed  methods  to  overcome  the  problem  of  multipath 
or  correlated  jammers,  but  for  the  most  part  they  have  addressed  the  simpler  problem 
of  a  small  number  of  reflections  encountered  in  other  appHcations.  For  example,  one 
approach  is  to  include  a  model  of  the  multipath  in  the  design  and  null  the  reflections 
before  they  enter  the  adaptive  processor  (Owsley,  1985).  Obviously,  this  is  only 
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appropriate  when  an  accurate  model  of  the  multipath  exists  and  is  not  applicable  to 
acoustic  reflections  in  arbitrary  rooms. 

A  more  general  approach  to  the  problem  of  correlated  jammers  is  spatial  dither¬ 
ing,  intended  to  eliminate  the  correlation  between  the  on-axis  target  and  its  off-axis 
reflections.  This  was  first  suggested  by  Widrow  et  al.  (1982)  who  proposed  mechan¬ 
ically  moving  the  sensor  array  along  a  line  perpendicular  to  the  direction  of  arrival 
of  the  desired  signal.  Shan  and  Kailath  (1985)  proposed  a  method  called  spatial 
smoothing  to  accompHsh  the  dithering.  Spatial  smoothing  uses  subarrays  of  the  total 
sensors  and  requires  twice  as  many  sensors  as  correlated  signal  sources,  meaning  that 
in  theory  the  system  requires  twice  as  many  microphones  as  the  number  of  target  re¬ 
flections.  Hoffman  et  al.  (1991)  propose  a  method  of  virtual  dithering  that  performs 
spatial  smoothing  by  applying  matrix  transformations  to  a  number  of  sensor  signals 
and  therefore  does  not  require  extra  physical  sensors.  However,  their  work  does  not 
indicate  how  many  virtual  transformations  need  to  be  performed  or  if  that  number 
is  related  to  the  number  of  target  reflections  as  in  spatial  smoothing.  Both  spatial 
smoothing  and  virtual  dithering  were  demonstrated  to  work  for  a  small  number  of 
narrowband  correlated  jammers.  The  problem  is  that  in  a  reverberant  environment, 
the  number  of  reflections  will  outnumber  the  number  of  microphones  or,  in  the  case 
of  virtual  dithering,  the  number  of  matrix  transformations  that  can  feasibly  be  per¬ 
formed.  In  order  to  demonstrate  that  these  techniques  have  any  potential  for  the 
hearing  aid  application,  they  must  first  be  analyzed  in  situations  where  the  target 
reflections  outnumber  the  microphones. 

Hoffman  et  al.  (1994)  suggest  a  simple  solution  to  the  problems  caused  by  target 
reflections;  they  note  that  appropriate  selection  of  the  parameters  of  the  generalized 
sidelobe  canceller  (in  particular,  setting  the  primary  delay  to  zero)  will  prevent  target 
cancellation.  This  idea  will  be  analyzed  and  extended  in  Ch.  6. 

Reduced  jammer  cancellation 

Reduced  jammer  cancellation  can  be  examined  by  considering  the  performance  of 
LCMV  beamformers  against  a  single  reverberant  jammer  in  the  absence  of  a  target 
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signal.  Because  the  system  can  operate  only  over  the  time-span  of  the  adaptive  filter, 
jammer  cancellation  is  impaired  when  the  room  impulse  response  exceeds  that  span.® 
This  effect  is  illustrated  in  Greenberg  and  Zurek  (1992)  for  a  two-microphone  gen¬ 
eralized  sidelobe  canceller  with  two  filter  lengths  in  a  variety  of  reverberant  environ¬ 
ments.  The  results  show  that  although  extremely  long  filters  may  provide  substantial 
improvement  in  strong  reverberation,  the  filter  lengths  currently  attainable  with  a 
practical  system  only  provide  significant  improvements  at  source-to-array  distances 
less  than  the  critical  distance  of  the  room. 

When  the  filter  length  is  short  relative  to  the  impulse  response  of  the  room,  late 
refiections  arriving  at  the  array  wiU  be  uncorrelated  with  the  data  in  the  filter.  There¬ 
fore,  the  adaptive  filter  wiU  not  be  of  any  use  in  cancelling  the  late  reflections.  Fur¬ 
thermore,  it  is  reasonable  to  assume  that  the  late  reflections  are  equally  likely  to  arrive 
from  all  directions.  In  this  situation,  the  late  reflections  can  be  modeled  accurately 
as  isotropic  noise  (Beranek,  1954;  Cremer  and  Muller,  1982).  An  array  designed  to 
perform  optimally  against  isotropic  noise  has  maximum  directivity  (Peterson,  1989), 
which  is  typically  the  design  criterion  for  fixed  array  processors. 

In  terms  of  the  generalized  sidelobe  canceller  (Figs.  2.3  and  2.4),  as  reverberation 
increases,  the  reference  signals  at  the  adaptive  filter  inputs  become  progressively  less 
correlated  with  the  primaxy  signal  in  the  upper  channel.  In  the  extreme,  they  are 
completely  uncorrelated  and  the  adaptive  filter  weights  tend  to  zero.  In  this  case, 
the  system  output  simply  equals  the  primary  signal,  and  the  performance  of  the 
system  depends  on  the  fixed  processor  defined  by  the  constraints.  Therefore,  in  order 
to  design  an  adaptive  system  that  performs  optimally  in  extreme  reverberation,  the 
constraints  should  define  an  underlying  fixed  processor  with  maximum  directivity. 
The  design  choices  that  affect  the  fixed  processor  performance  are  the  number  of 
microphones,  their  directional  characteristics,  and  the  weights  implemented  in  the 
constraints.  These  issues  wiU  be  discussed  in  Chs.  7  and  8. 

^Strictly  speaking,  the  impulse  response  is  infinitely  long  for  any  real  room.  Here,  the  length  of 
the  room  impulse  response  refers  to  the  length  of  time  that  non-negligible  refiections  continue  to 
propagate  after  the  direct  sound.  For  example,  this  meeisure  could  be  the  reverberation  time  of  the 
room  (time  required  for  reflections  to  decay  60  dB  relative  to  the  direct  power). 
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It  is  important  to  note  that  obtaining  the  benefits  of  the  underlying  fixed  proces¬ 
sor  in  extreme  reverberation  does  not  require  any  additional  processing  effort;  rather, 
it  is  obtained  automatically  when  the  adaptive  weights  cannot  further  minimize  the 
output  power.  In  some  sense,  the  system  can  be  thought  of  as  a  hybrid  fixed/ adaptive 
axray  processor,  where  the  fixed  weights  (constraints)  provide  a  desired  default  re¬ 
sponse  in  the  absence  of  directional  jammers,  while  the  adaptive  process  utilizes  the 
degrees  of  freedom  in  the  adaptive  weights  to  reduce  directional  jammers  when  pos¬ 
sible.  The  result  is  a  system  that  maximizes  directivity  when  reverberation  is  strong, 
yet  allows  adaptation  to  provide  the  additional  benefit  obtained  from  cancelling  di¬ 
rectional  jammers  in  less  reverberant  environments. 


2.4  Goals 

The  goals  of  this  work  are  to  completely  specify  the  algorithm  for  use  in  adaptive 
microphone-array  hearing  aids  and  to  demonstrate  the  benefits  provided  by  such 
systems  in  a  variety  of  acoustic  environments.  The  algorithm  includes  several  features 
to  ensure  robustness  at  high  TJR  and  in  reverberation.  In  particular,  the  goals  of 
this  work  are: 

•  To  provide  a  thorough  analysis  of  the  two  ad  hoc  methods  for  controlling  adap¬ 
tation  proposed  by  Greenberg  and  Zurek  (1992).  This  analysis  will  result  in 
guidehnes  for  selecting  relevant  parameters  and  also  will  facilitate  comparison 
to  similar  methods  proposed  by  other  researchers. 

•  To  analyze  the  specific  causes  of  target  cancellation  in  reverberation  and  to  ex¬ 
plain  and  extend  previously  suggested  (Hoffman  et  al.,  1994)  parameter  choices 
to  eliminate  this  problem. 

•  To  estabhsh  the  usefulness  of  each  of  these  methods  individually  via  simple 
simulations  that  isolate  the  relevant  effects. 

•  To  demonstrate  the  effectiveness  of  the  modified  algorithm  by  combining  these 
methods  in  computer  simulations  and  evaluating  performance  under  a  range  of 


26 


acoustic  conditions.  These  simulations  will  also  investigate  selection  of  design 
parameters  such  as  the  adaptive  filter  length  and  the  number  of  microphones. 
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Chapter  3 


Methods 


The  purpose  of  this  chapter  is  to  introduce  several  elements  that  are  common  to  the 
analyses  and  simulations  performed  in  Chs.  4-7. 

3.1  Source  materials 

Since  speech  is  the  signal  of  interest  in  the  hearing  aid  application,  it  will  be  necessary 
to  evaluate  systems  using  speech  or  speech-like  signals.  When  possible,  initial  assess¬ 
ments  use  uncorrelated  zero-mean  white  Gaussian  noise  for  the  target  and  jammer 
signals.  When  more  realistic  source  signals  are  required,  the  target  signal  consists 
of  a  series  of  phonetically-balanced  sentences  (IEEE,  1969)  spoken  by  a  single  male 
talker  and  the  jammer  signal  consists  of  12-talker  SPIN  babble  (Kalikow  et  al.,  1977). 
These  sources  were  obtained  from  anechoic  recordings  that  were  digitized,  sampled 
at  10  kHz,  and  approximately  whitened  with  high  frequency  emphasis  of  6  dB /octave 
(Link  and  Buckley,  1993). 


3.2  Room  simulations 

Convolving  the  source  materials  with  source-to-microphone  impulse  responses  pro¬ 
duces  microphone  signals  appropriate  for  input  to  the  systems  of  interest.  Those 
impulse  responses  can  be  obtained  from  recordings  made  in  real  rooms,  or  from  a 


28 


room  simulation.  This  work  uses  source-to-microphone  impulse  responses  generated 
by  a  simulation  of  free-space  microphones  in  a  rectangular  room  with  specifiable  di¬ 
mensions  and  uniform  surface  absorption  (Peterson,  1986). 

The  following  analyses  and  simulations  utilize  a  single  room.  Its  dimensions  are 
5.2  X  3.4  X  2.8  meters.  This  is  slightly  larger  than  the  ‘living  room’  used  by  Peterson 
(1989).  One  corner  of  the  room  is  the  origin  of  a  three  dimensional  coordinate  system 
with  the  room  oriented  squarely  with  three  orthogonal  planes.  The  center  of  the  array 
is  at  (2.755,  1.380,  1.600)  meters,  and  the  array  is  oriented  along  the  straight  line 
defined  by  the  array  center  and  the  point  (2.685,  1.400,  1.600)  meters.  AU  sources 
are  located  around  the  array  center  in  a  circle  in  the  horizontal  plane  with  radius 
0.9  meters  at  a  height  of  1.7  meters.  A  source  at  zero  degrees  is  located  at  array 
broadside  in  the  direction  of  positive  coordinates  from  the  array,  and  positive  source 
angles  refer  to  clockwise  progression  from  zero  when  viewed  from  above. 

Two  linear  broadside  arrays  of  omnidirectional  elements  will  be  investigated.  The 
first  is  7  cm  in  length  Avith  two  microphones.  This  array  was  selected  because  of 
its  promising  performance  in  earlier  work  (Greenberg  and  Zurek,  1992),  and  because 
of  a  desire  to  investigate  a  relatively  simple  system  (two  microphones).  The  second 
array  is  16  cm  in  length  with  5  microphones  uniformly  spaced,  resulting  in  4  cm 
intermicrophone  spacing.  The  array  length  was  selected  to  be  roughly  ‘head-sized’, 
and  the  number  of  microphones  was  selected  to  prevent  spatial  undersampling  for 
frequencies  below  5  kHz  when  evenly  spaced  throughout  that  length. 

These  arrays  are  simulated  in  free  space,  but  in  real  apphcations  they  will  be  placed 
on  or  near  the  listener’s  head.  Although  the  presence  of  the  head  affects  the  structure 
of  the  signals  received  at  the  array,  it  has  little  effect  on  the  resulting  performance 
of  broadside  arrays  (Greenberg  and  Zurek,  1992;  Soede  et  al.,  1993a).  Previous  work 
(Greenberg  and  Zurek,  1992)  has  shown  that  endfire  arrays  are  much  more  sensitive 
to  the  presence  of  the  head;  therefore  this  work  only  considers  broadside  arrays. 

For  both  the  two-  and  five-microphone  arrays  described  above,  three  values  of 
the  uniform  surface  absorption  axe  used  in  the  room  simulation  to  generate  source- 
to-microphone  impulse  responses  with  different  degrees  of  reverberation.  The  three 
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absorption  values  are  1.0  (anechoic),  0.6,  and  0.2.  For  the  room  described  above, 
the  moderately  reverberant  room  had  a  direct-to-reverberant  ratio  of  +6  dB  and  a 
reverberation  time  of  150  ms,  while  the  more  strongly  reverberant  room  had  a  direct- 
to-reverberant  ratio  of  —2  dB  and  a  reverberation  time  of  620  ms.  Representative 
source-to-microphone  impiilse  responses  are  shown  in  a  later  chapter  (Fig.  6.2). 


3.3  Performance  metric 

Since  the  purpose  of  these  systems  is  to  improve  the  intelligibility  of  speech,  the  ulti¬ 
mate  test  of  their  effectiveness  comes  from  tests  of  intelligibility  with  human  subjects. 
However,  such  tests  are  time-consuming  and  do  not  allow  rapid  evaluations  of  many 
algorithms  and  parameter  choices.  For  efficiency,  previous  work  has  used  a  physi¬ 
cal  measure,  the  intelligibility-weighted  gain,  for  preliminary  assessment  of  system 
performance.  This  section  summarizes  the  computation  of  the  intelligibility- weighted 
gain,  denoted  Gj,  as  described  elsewhere  (Peterson,  1989;  Greenberg  and  Zurek,  1992; 
Greenberg  et  al.,  1993). 

Gj  is  based  on  the  intelligibility-weighted  level,  r(s),  given  by 


rw  =  E®iBi(»).  (31) 

i 

where  Bj{s)  is  the  decibel  level  in  the  frequency  band  of  the  signal  s  and  aj  is  the 
weight  reflecting  the  contribution  of  that  band  to  intelligibility.  In  principle,  these 
measures  can  be  based  on  any  index  designed  to  predict  intelligibility.  In  this  work, 
Gj  is  based  on  the  Articulation  Index  (ANSI,  1969;  Kryter,  1962)  with  the  weights, 
aj,  reflecting  the  contribution  of  each  one-third-octave  band  to  intelligibility. 

The  absolute  values  of  these  intelligibility- weighted  levels  depend  on  the  reference 
level  and  are  therefore  arbitrary.  However,  they  can  be  used  to  make  comparisons 
between  signals.  For  these  comparisons,  there  are  four  signals  of  interest;  target 
input,  Ti;  target  output.  To]  jammer  input,  Ji]  jammer  output,  Jo-  The  improvement 
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from  input  to  output  in  target  and  jammer  is  given  by 

Ar(T)  =  r(r„)-r(Ti)  (3.2) 

and 

Ar(  J)  =  T{Ji)  -  r(  J„).  (3.3) 

Positive  values  indicate  improved  intelligibility  (amplification  of  the  target  or  atten¬ 
uation  of  the  jammer),  while  negative  values  indicate  degraded  intelligibility  (atten¬ 
uation  of  the  target  or  amplification  of  the  jammer).  Gj  is  then  given  by  the  overall 
intelligibility-weighted  gain  in  TJR  from  input  to  output, 

Gi  =  Ar(r)  +  Ar(j)  (3.4) 

=  r(T„)-r(T0  +  r(J0  +  r(Jo).  (3.5) 

Other  useful  measures  axe  obtained  by  combining  values  of  T  for  the  input  and 
output  components  separately,  resulting  in  measures  of  intelligibility- weighted  target- 
to-jammer  ratio  at  the  input  and  output  of  the  system.  These  measures  axe  given 
by 

TJRi,^  =  T(Ti)-T{Ji)  (3.6) 

and 

TJR/.„ut  =  r(r,)  -  T{J,),  (3.7) 

and  can  also  be  combined  to  determine  the  intelligibility- weighted  gain  according  to 

Gi  =  TJR/, out  —  TJRj,jn.  (3.8) 

In  order  to  calculate  Gi,  it  is  necessary  to  obtain  separate  taxget  and  jammer  sig¬ 
nals  at  the  system  output.  This  is  accomplished  with  a  controlling  processor  and  two 
yoked  processors  (Greenberg  and  Zurek,  1992).  The  controlling  processor  operates 
on  the  total  input  signal  (target  plus  jammer)  while  each  yoked  processor  has  the 
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same  structure  as  the  controller,  and  processes  either  the  target  or  jammer  signal,  Ti 
or  Ji.  The  adaptive  filter  weights  of  the  yoked  processor  axe  copied  exactly  from  the 
controlling  processor.  Because  the  filtering  operation  is  linear,  superposition  holds 
and  the  total  system  output  provided  by  the  controDing  processor  equals  the  sum  of 
the  two  yoked  processor  outputs.  To  and  Jo- 

Using  additional  yoked  processors,  this  approach  can  be  extended  to  investigate 
the  effect  of  the  system  on  other  components  of  the  input  signals.  For  instance,  by 
separating  the  direct  wave  from  the  reflections  of  the  source-to-microphone  impulse 
responses  and  convolving  them  individually  with  the  same  target  source  material,  it 
is  possible  to  obtain  the  direct  and  reflected  target  at  the  inputs,  Td^i  and  T^^i,  where 
Ti  =  Td,i  +  Tr,i.  Using  these  signals  as  the  inputs  to  additional  yoked  processors 
produces  the  direct  and  reflected  target  at  the  output,  Td,o  and  Tt,oi  where  To  = 
Tdfi  +  Tr^o-  Applying  intelligibility  weighting  to  these  signals  provides  an  indication 
of  how  the  system  affects  the  direct  target  and  the  reflected  target  individually,  that 
is, 

LI{Td)^V{Td,o)-T{Td,^  (3.9) 

and 

Ar(ro  =  r(r,,„)  -  r(T,.i).  (3.io) 

However,  even  though  Ti  =  Td,i  +  Tr,i  and  To  =  Td^o  +  Tr,oi  Ar(T)  7^  Ar(r££)  +  Ar(Tr) 
because  r(s)  is  a  nonlinear  function  of  the  signal  s.  Also,  these  values  will  not 
be  meaningful  if  the  output  includes  cancellation  of  direct  target  based  on  taxget 
reflections. 

As  defined  here,  the  sign  of  Ar(T,.)  suggests  that  target  reflections  contribute  to 
intelligibility.  In  fact,  early  reflections  (arriving  within  up  to  50-95  ms  of  the  direct 
wave)  tend  to  improve  intelligibility  while  later  reflections  tend  to  be  detrimental  to 
intelligibility  (Cremer  and  Muller,  1982).  To  obtain  more  accurate  treatment  of  the 
effect  of  late  reflections,  the  calculation  of  intelligibility- weighted  levels  (3.1)  could  be 
modified  to  include  the  Speech  Transmission  Index  (Steeneken  and  Houtgast,  1980; 
Houtgast  et  al.,  1980).  Using  the  Articulation  Index  and  computing  Gi  according 
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to  (3.5)  treats  all  target  reflections  as  if  they  contribute  to  intelligibility.  If  only 
early  reflections  exist,  then  Gi  is  an  accurate  measure  of  the  effect  of  the  system  on 
intelligibility.  On  the  other  hand,  if  only  late  reflections  exist,  then  Gi  underestimates 
intelligibility  if  the  system  attenuates  the  reflections  ( Ar(Tr)  <  0),  and  overestimates 
intelligibility  if  the  system  amplifies  the  reflections  (Ar(rr)  >0).  In  reality,  the  target 
reflections  consist  of  both  early  and  late  reflections,  and  the  results  in  Ch.  7  show 
that  Ar(Tr)  is  typically  negative.  Therefore,  Gr  is  at  best  accurate,  and  at  worst  a 
conservative  estimate  of  system  performance. 

Finally,  for  assessing  the  performance  of  fixed  systems  in  extreme  reverberation 
approaching  an  isotropic  field,  it  is  useful  to  apply  intelligibility-weighting  to  the 
directivity  index  (Peterson,  1989).  The  directivity  index  is  defined  as  the  ratio  of  the 
output  power  due  to  sounds  from  the  taxget  direction  to  the  average  output  power 
due  to  sounds  incident  from  all  directions.  Since  the  directivity  index  typically  varies 
with  frequency,  a  useful  measure  is  the  broadband  intelligibility- weighted  directivity, 

Di  =  '£aiDj,  (3.11) 

3 

where  Dj  is  the  directivity  index  corresponding  to  the  frequency  band  in  units  of 
decibels  and  oy  is  the  weight  reflecting  the  contribution  of  that  band  to  intelligibility 
as  in  (3.1). 
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Chapter  4 


Optimal  step-size  parameter  for 
the  LMS  algorithm. 


4.1  Introduction 

This  chapter  is  concerned  with  determining  the  optimal  step-size  parameter  to  use 
with  the  LMS  algorithm.  For  simplicity,  the  analysis  is  performed  for  an  adaptive 
noise  canceller,  but  the  results  apply  to  the  generalized  sidelobe  canceller  implemen¬ 
tation  of  LCMV  beamforming,  because,  as  explained  in  Sec.  2.1,  the  generalized 
sidelobe  canceller  can  be  thought  of  as  a  preprocessor  followed  by  an  adaptive  noise 
canceller. 

The  optimal  time- varying  step-size  parameter  is  defined  as  one  that  minimizes  the 
steady-state  excess  mean-squared  error  (mse)  due  to  the  adaptive  process.  Realizable 
expressions  for  the  step-size  parameter  based  on  quantities  available  to  the  adaptive 
processor  are  developed  and  their  performance  is  compared  to  that  obtained  with 
the  optimal  (non-realizable)  step-size  parameter  result.  In  addition,  the  resulting 
convergence  time  is  determined  for  the  optimal  and  realizable  step-size  parameters. 

The  analysis  begins  with  expressions  for  the  behavior  of  the  LMS  algorithm  that 
are  available  in  the  literature;  these  expressions  are  modified  to  include  the  effects  of 
the  target  signal  in  the  primary  input  to  the  adaptive  noise  canceller.  It  is  shown 
that  the  traditional  method  of  normalizing  the  step-size  parameter  leads  to  poor  per- 
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formajice  in  the  presence  of  strong  target  signals.  Next,  an  expression  is  derived  for 
the  step-size  parameter  that  minimizes  the  steady-state  excess  mse  of  the  adaptive 
noise  canceller’s  output.  However,  this  optimal  step-size  parameter  cannot  be  im¬ 
plemented  in  a  real  system,  since  it  requires  knowledge  of  quantities  that  are  not 
available.  Instead,  simplifying  assumptions  are  made  about  the  unknown  quantities 
in  order  to  obtain  expressions  for  the  step-size  parameter  that  can  be  implemented 
in  a  real  system.  Three  different  expressions  for  the  step-size  parameter  resrdt  from 
different  sets  of  assumptions. 

The  analysis  described  above  produces  five  candidates  for  the  step-size  parameter 
in  the  LMS  algorithm.  The  first  is  the  value  traditionally  used,  which  ignores  the 
presence  of  the  target  signal  in  the  adaptive  noise  canceller’s  primary  input.  The  sec¬ 
ond  is  the  optimal  value,  which  is  of  theoretical  interest  but  cannot  be  implemented 
in  a  real  system.  The  last  three  are  the  step-size  parameters  derived  from  the  op¬ 
timal  expression  bcised  on  different  simplifying  assumptions.  For  each  of  these  five 
expressions  for  the  step-size  parameter  (traditional,  optimal,  and  three  methods  based 
on  simplifying  assumptions)  expressions  are  derived  to  characterize  the  steady-state 
excess  mse  and  the  transient  behavior. 

The  results  are  summarized  in  Table  4.1  on  page  53.  They  show  that  the  optimal 
step-size  parameter  results  in  a  steady-state  excess  mse  equal  to  zero.  The  traditional 
step-size  parameter  results  in  a  steady-state  excess  mse  that  increases  linearly  with 
target  signal  power.  The  three  new  expressions  for  the  step-size  parameter  result  in 
values  of  steady-state  excess  mse  that  are  nonzero,  but  preferable  to  the  traditional 
method.  Iterative  expressions  are  determined  to  characterize  the  transient  behavior 
of  the  adaptive  process.  These  expressions  permit  comparisons  between  the  differ¬ 
ent  step-size  parameter  algorithms,  but  the  expressions  depend  on  relative  signal 
strengths  and  the  spread  of  the  eigenvalues  of  the  autocorrelation  matrix  of  the  ref¬ 
erence  signal.  For  the  most  promising  method,  time  constants  are  determined  for 
exponential  decays  that  approximate  the  transient  behavior. 
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4.2  Background 


4.2.1  The  adaptive  noise  canceller 

A  block  diagram  of  the  adaptive  noise  canceller  was  shown  in  Fig.  2.1.^  The  adaptive 
noise  canceller  requires  two  inputs.  The  primary  input  contains  target  plus  jammer, 
denoted  by  t{n)  +  j{n),  where  n  is  the  discrete-time  index.  The  reference  input 
contains  a  filtered  version  of  the  jammer,  ®(n),  and  is  (ideally)  free  of  target.  The 
reference  signal  passes  through  an  L-tap  adaptive  FIR  filter,  whose  weights,  Wk{n) 
for  A;  =  0, . . . ,  i  —  1,  axe  adjusted  to  minimize  the  power  in  the  output  signal.  This 
minimization  is  achieved  by  filtering  x{n)  to  approximate  j{n)  and  subtracting  it 
from  the  primary  signal.  With  the  primary  delay  equal  to  zero,  the  output  of  the 
adaptive  noise  canceller,  y{n),  is  given  by 

L-l 

y{n)  =  f(n)  -|-  j{n)  -  ^  Wk{n)x{n  —  k).  (4.1) 

k=0 

If  the  target  and  jammer  are  uncorrelated  and  the  reference  input  contains  no  target, 
then  minimizing  the  power  in  y{n)  results  in  an  output  signal  with  t(w)  perfectly 
preserved  and  the  jammer  power  minimized. 

The  analysis  presented  here  is  based  on  the  following  assumptions: 

1.  The  primary  target  and  jammer  signals  are  uncorrelated;  t(n)  is  uncorrelated 
with  j{n). 

2.  There  is  no  leakage  of  target  signal  into  the  reference  input;  t{n)  is  uncorrelated 
with  x{n). 

3.  The  signals  t{n),  j{n),  and  x{n)  are  all  real  and  zero-mean. 

4.  The  signals  t{n),  j{n),  and  x(n)  are  wide-sense  stationary,  that  is,  their  second- 
order  statistics  are  constant.  This  restriction  is  necessary  to  simphfy  the  deriva¬ 
tion,  but  win  be  lifted  in  the  subsequent  interpretation  of  the  results. 

^In  the  following  analysis,  the  primary  channel  delay,  D,  is  set  to  zero,  but  the  results  are 
applicable  when  the  delay  is  nonzero  as  well. 
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The  following  definitions  will  simplify  notation.  Boldface  lowercase  and  uppercase 
letters  represent  vectors  and  matrices,  respectively,  while  ^  denotes  transpose  and 
E[  ]  denotes  expected  value.  The  signal  powers  are  defined  as 


=  E[e{n)\  (4.2) 

a]  =  E\j\n)]  (4.3) 

(tI  =  E[x'^{n)].  (4.4) 


The  data  vector  is 


x(n)  =  [a;(Ti)  x{n  -  1)  •  •  •  x{n  -  {L  -  1))]’^’  (4.5) 

with  elements  equal  to  the  values  in  the  tapped  delay  Une  of  the  adaptive  filter.  The 
weight  vector  is 

w(7i)  =  [i(;o(n)  u;i(n)  •••  W£_i(7i)]^  (4.6) 

The  data  autocorrelation  matrix  and  cross-correlation  vector  are  given  by 


£?[x(n)x^(n)]  =  R 


(4.7) 


and 

^[x(»)j(n)],=  p, 

while  from  the  second  assumption. 


(4.8) 


E[x(7l)t(7l)]  =  0,  (4.9) 

where  0  is  the  vector  of  L  zeros.  The  autocorrelation  matrix  R  is  symmetric,  ToepHtz, 
and  positive  definite  (Haykin,  1986).  The  eigenvalues  of  R  are  positive  and  are 
denoted  Aj  for  i  =  1, . . . ,  L.  The  diagonal  entries  of  R  equal  and,  from  the 
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definition  of  the  trace  of  a  matrix  (Strang,  1988), 


|:Ai  =  lrlR]  =  i<rJ.  (4.10) 

t=l 

where  tr[  ]  denotes  the  trace  of  a  matrix. 

The  optimal  values  for  the  adaptive  weights,  w*,  are  given  by  the  solution  to  the 
Wiener-Hopf  equation  (Widrow  and  Stearns,  1985), 

w*  =  R’^p.  (4.11) 

The  error  signal,  e(n),  is  the  difference  between  the  actual  output  and  the  desired 
output,  that  is, 

e{n)  =  y{n)  -  t{n)  =  j(n)  -  w^(n)x(TO),  (4.12) 

where  vector  multiphcation  has  replaced  the  summation  in  (4.1).  The  minimum  error 
(in  the  mean-squared  sense)  is  obtained  when  the  adaptive  weights  are  fixed  at  their 
optimal  values  so  that 

ei„m(n)  =  j{n)  -  w*^x(7i).  (4.13) 

The  minimum  error  is  uncorrelated  with  the  reference  input,  that  is, 

E[x(7J.)enun(72')]  =  0  (4-14) 

(Haykin,  1986).  The  mean-squared  error  associated  with  a  particular  weight  vector, 
J(w),  is  given  by 


J(w)  =  S[e=*(n)|w(n)=w]  =  -f  w^Rw  (4.15) 

and  the  minimum  mse  is 

=  £[e“(u)Uw.w-l  =  E[eLi.(»)l  =  <'1  -  (4.16) 


(Haykin,  1986). 
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4.2.2  The  LMS  algorithm 


Definitions 

The  problem  of  determining  the  optimal  value  of  the  adaptive  weight  vector,  w*,  can 
be  interpreted  geometrically  by  considering  the  mean-squared  error  as  a  function  of 
the  filter  weights,  This  results  in  a  concave-upwajd  hyperparaboloid  in  (L-|- 1)- 

dimensional  space  (Widrow  and  Stearns,  1985).  The  minimum  of  the  hyperparaboloid 
corresponds  to  Jmm-  Gradient  search  algorithms  operate  by  determining  or  estimating 
the  gradient  of  the  error  surface  for  the  current  value  of  the  adaptive  weights  and 
iteratively  modifying  the  weights  to  travel  in  the  direction  of  the  negative  gradient  in 
an  attempt  to  reach  the  “bottom  of  the  bowl”.  The  (real)  LMS  algorithm  (Widrow 
and  Stearns,  1985)  is  a  simple  gradient  search  that  uses  the  instantaneous  value  y^{n) 
as  an  estimate  of  E[e^{n)],  which  is  equivalent  to  using  the  instantaneous  values  of 
x(n)x^(n)  and  x{n){t{n)  +  j[n))  as  crude  estimates  of  their  expected  values,  R  and 
p,  respectively,  required  for  the  true  gradient.  The  resulting  weight  update  equation 
for  the  LMS  algorithm  is 


w(n  +  1)  =  w(ti)  +  fiy{n)x{n),  (4.1T) 

where  y{n)x{n)  is  an  estimate  of  the  negative  gradient,  and  the  parameter  ji  controls 
the  size  of  the  adaptive  steps  and  has  units  of  inverse  power. 

Despite  the  widespread  use  of  the  LMS  algorithm,  there  is  no  unconditional  proof 
of  its  convergence  (Widrow  and  Stearns,  1985).  AU  known  convergence  proofs  for  the 
LMS  algorithm  require  certain  assumptions  about  the  statistics  of  the  inputs  in  order 
to  make  analysis  of  the  algorithm  mathematically  tractable.  One  widely  used  set  of 
assumptions  is  independence  theory,  which  assumes  the  independence  of  successive 
data  vectors.  Using  independence  theory,  it  follows  that  the  current  weight  vector 
depends  on  past  values  of  the  inputs,  but  is  independent  of  the  current  inputs.  The 
assumptions  of  independence  theory  are  violated  for  many  practical  problems,  includ¬ 
ing  the  adaptive  noise  canceller.  Despite  the  violation  of  these  assumptions,  results 
predicted  using  independence  theory  are  usually  found  to  be  in  excellent  agreement 
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with  experiments  and  computer  simulations  (Haykin,  1986,  p.  239).  The  assumptions 
of  independence  theory  are  used  as  needed  throughout  the  remziinder  of  this  work. 
Using  independence  theory,  the  IMS  algorithm  converges  if 


0  <  /i  <  ^  (4.18) 

(Widrow  and  Stearns,  1985).  Typically,  the  step-size  parameter  is  defined  in  terms 
of  the  dimensionless  step-size  parameter,  a,  which  is  related  to  /itrodj^  hy 


/^trad  — 


(4.19) 


Combining  (4.18)  and  (4.19),  the  LMS  algorithm  converges  for  values  of  the  dimen¬ 
sionless  step-size  parameter  in  the  range 


0  <  a  <  2.  (4.20) 

A  closely  related  method  is  the  normalized  LMS  (NLMS)  algorithm,  proposed 
by  Nagumo  and  Noda  (1967).  The  weights  of  the  NLMS  algorithm  are  updated 
according  to 

w(n  +  1)  =  w(»)  +  (4.21) 

where  a  is  the  dimensionless  step-size  parameter.  The  algorithm  converges  for  values 
of  the  step-size  parameter  in  the  range  0  <  a  <  2.  Comparing  (4.21)  to  (4.17)  reveals 
that  the  two  algorithms  are  equivalent  when 

a 

^  x^(n)x(7i) 

Noting  that  £J[x^(n)x(n)]  =  Lai  comparing  (4.22)  with  (4.19)  reveals  that  the 
NLMS  algorithm  is  equivalent  to  the  LMS  algorithm  with  the  step-size  parameter 
normalized  according  to  (4.19)  if  the  reference  signal  power,  al,  is  estimated  from  the 

^The  subscript  is  used  to  distinguish  the  traditional  method  of  computing  n  from  the  methods 
that  will  be  proposed  in  Sec.  4.3. 
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current  data  vector,  x(n),  at  each  iteration. 


Performance  analysis 

When  any  adaptive  algorithm  is  used,  the  weights  vary  with  time,  as  does  the  associ¬ 
ated  mse.  An  expression  for  J{n),  the  mse  due  to  the  LMS  algorithm  at  time  n,  can 
be  found  by  squaring  (4.12)  and  taking  its  expected  value,^  producing 

J{n)  =  E[e\n)]  =  J^^nn  +  £?[v^(n)Rv(7i)],  (4-23) 

where  the  weight  error  vector,  v(7i)  is  defined  as 

v(ti)  =  w(n)  —  w*.  (4.24) 

The  excess  mse,  Jcx(^)j  is 

J„(n)  =  J(7i)  -J^  =  E[v^{n)K-v{n)].  (4.25) 


and  is  nonzero  when  the  weights  deviate  from  their  optimal  values.^  Defining  the 
weight  error  correlation  matrix, 


K(n)  =  E[v{n)x^{n)],  (4.26) 

and  using  the  property  tr[AB]  =  tr[BA],  yields 

Jex(n)  =  tr[RK(7i)]  (4.27) 


^Although  (4.15)  and  (4.23)  are  both  derived  from  (4.12),  the  two  equations  differ  in  that  (4.15) 
is  the  constant  mse  based  on  an  arbitrary,  fixed  weight  vector,  while  (4.23)  is  the  time-varying  mse 
corresponding  to  the  sequence  of  weight  vectors  determined  by  the  LMS  algorithm. 

'^Note  that  the  expectations  in  (4.23)  and  (4.25)  are  not  expectations  over  time.  Rather,  they 
correspond  to  an  ensemble  average  based  on  different  input  sequences  selected  at  random  from  the 
same  statistical  population.  Haykin  (1986)  uses  the  notation  J{n)  and  Jex(?^)  to  denote  the  value 
of  the  error  based  on  the  instantaneous  weight  vector  and  the  expected  value  of  the  input  vector, 
that  is,  Jex(^)  =  v^(n)Rv(n),  and  then  gives  later  results  in  terms  of  j!5[Jex(’^)]«  In  a  subsequent 
edition,  Haykin  (1991)  defines  J(n)  =  £?[e^(n)],  which  leads  to  Jex(^)  ==  .K[v^(n)Rv(n)].  The  latter 
definition  is  used  in  this  work,  and  appropriate  substitutions  are  made  when  reproducing  expressions 
from  Haykin  (1986). 
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(Haykin,  1986,  Eq.  5.80).  It  can  be  shown  that  the  expected  value  of  the  system 
output  equals  the  sum  of  the  target  signal  power,  the  minimum  mse,  and  the  excess 
mse,  that  is, 

E[y^(n)]  -  <t]  +  Jrain  +  Jcx{n).  (4.28) 


A  useful  measure  of  steady-state  performance  is  the  steady-state  excess  mse, 
Jex(oo).  This  quantity  is  non-zero  for  the  LMS  algorithm,  because  it  reflects  the 
error  due  to  the  ongoing  adaptive  process,  that  is,  the  fluctuation  of  the  weights 
about  their  optimal  Aralues  after  they  have  converged  in  the  mean.  The  steady-state 
ratio  of  excess  mse  to  minimum  mse  — is  referred  to  as  mis  adjustment  When  the 
step-size  parameter,  fi,  is  small,  both  Jex(oo)  and  misadjustment  are  proportional  to 
fi.  However,  it  is  not  possible  to  make  these  quantities  arbitrarily  small  by  reducing 
fi,  because  the  convergence  time  of  the  LMS  algorithm  is  inversely  proportional  to  fi. 
Selection  of  the  step-size  parameter,  fi,  in  the  LMS  algorithm  represents  a  fundamen¬ 
tal  tradeoff  between  convergence  time  and  steady-state  error  (Widrow  and  Stearns, 
1985). 

An  expression  for  the  steady-state  excess  mse,  Jex(oo),  for  the  traditional  LMS 
algorithm  with  no  target  signal  present  is  given  by 


St=l 

2  -  ^tr  E^i  A, 


JminMr[R] 

2  —  /itr[R] 


(4.29) 


(Nehorai  and  Malah,  1980;  Haykin,  1986,  Eq.  5.108).  One  modification  to  (4.29)  is 
reqidred  before  it  can  be  applied  to  the  adaptive  noise  canceller.  The  presence  of  the 
target  signal,  t(n),  is  standard  for  the  adaptive  noise  canceller  configuration,  but  is 
in  contrast  to  the  typical  problem  formulation  for  adaptive  transversal  filters  and  the 
usual  assumptions  governing  derivation  of  the  LMS  algorithm.  It  can  be  shown  that 
the  presence  of  target  in  the  primary  signal  does  not  affect  the  convergence  of  the 
mean  weights  to  their  optimal  values,  but  the  target  signal  does  introduce  additional 
noise  in  the  gradient  estimates,  thereby  affecting  the  weight-error  correlation  matrix 
and  the  steady-state  excess  mse.  Following  a  derivation  similar  to  the  one  in  Haykin 
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(1986)  but  including  the  taxget  signal,  t(n),  prodnces 

j(oo)  =  ^ 

^  ^  2  -  fiLal 

This  expression  differs  from  (4.29)  only  by  the  inclusion  of  tr^ .  It  is  intuitively  satis¬ 
fying  to  see  that  the  effects  of  the  target  signal  and  the  miTiiTmim  error  signal  (rep¬ 
resented  by  Jmin)  the  same,  since  both  of  these  signals  are  uncorrelated  with  the 
reference  input  [(4.9)  and  (4.14)],  and  both  appear  as  noise  in  the  adaptive  process. 

For  the  traditional  method  of  calcxdating  the  step-size  parameter,  replacing  [i  in 
(4.30)  with  (4.19)  yields 

J„(oc)  =  +  (4.31) 

z  —  a 

This  result  shows  that  for  the  traditional  method  of  calculating  the  step-size  param¬ 
eter,  the  steady-state  excess  mse  is  proportional  to  the  taxget  signal  power,  rendering 
the  LMS  algorithm  ineffective  in  the  presence  of  strong  target  signals.  Although 
this  is  recognized  as  a  shortcoming  of  the  LMS  algorithm  in  apphcations  with  strong 
target  signals,  the  expHcit  relationship  described  by  (4.31)  is  not  well-known. 


4.3  Proposed  methods  of  calculating  the  step-size 
parameter 

4.3,1  Derivation  of  the  optimal  step-size  parameter 

Choice  of  quantity  to  optimize 

The  goal  is  to  optimize,  in  some  sense,  the  step-size  parameter,  p,  in  equation  (4.17). 
This  requires  replacing  the  constant  parameter,  fi,  with  a  time- varying  quantity  /i(n). 
The  time- varying  step-size  parameter  will  be  derived  to  minimize  the  expected  value 
of  an  error  measure  at  each  iteration.  Selection  of  the  particular  error  measure  is 
discussed  below. 

This  approach  is  based  on  the  modification  commonly  made  when  <rl  is  unknown 
a  'priori  or  when  the  second-order  statistics  of  s(n)  exhibit  nonstationarities  that  vary 
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slowly  with  respect  to  n.  In  those  cases,  cr^  is  replaced  by  a  running  estimate  of  the 
power  in  the  reference  signal,  d'l{n),  and  the  constant  fi  in  (4.17)  is  replaced  by  the 
time- varying  quantity 

”  Lalin) 

where  the  dimensionless  step-size  parameter,  a,  remains  constant.  The  proposed 
modifications  will  use  a  similar  approach,  employing  estimates  of  additional  signal 
powers  as  necessary. 

Another  motivation  for  this  approach  is  the  effect  of  strong  target  signals  on  the 
performance  of  the  adaptive  noise  canceller.  As  shown  in  Sec.  4.2.2,  the  steady-state 
excess  mse,  Jex(oo),  is  proportional  to  the  target  power,  (r^,  as  well  as  to  the  step- 
size  parameter,  fi.  If  the  target  signal  exhibits  short-term  power  fluctuations  (as  is 
characteristic  of  speech,  for  example),  then  the  degrading  effect  of  strong  tajget  signals 
can  be  mitigated  by  normalizing  the  step-size  parameter  with  a  short-time  estimate  of 
target  power,  so  that  the  incremental  adjustments  to  the  adaptive  weights  are  larger 
in  intervals  when  the  target  is  weak  and  smaller  when  the  target  is  strong. 

In  the  following  derivation,  the  error  measure  to  be  minimized  is  the  expected 
value  of  the  total  weight  error  power,  E[v^(n)v(7i)],  or  equivalently  tr[K(n)],  which 
is  L  times  the  mean-squared  weight  error.  Rather  than  minimizing  this  quantity,  it 
might  be  preferable  to  minimize  the  excess  mean-squared  output  error  Jcx{n),  which, 
from  (4.25)  and  (4.27),  corresponds  to  minimizing  £?[v^(7i)Rv(ra)],  or  equivalently, 
tr[K(n)R].  However,  the  expression  for  fi  obtained  by  minimizing  either  fl?[v^(7i)v(n)] 
or  £?[v^(n)Rv(n)]  is  only  of  theoretical  interest,  since  it  will  require  quantities  not 
available  to  the  adaptive  processor.  These  two  error  measures  have  similar  structures 
and  are  equivalent  when  R  =  I,®  where  I  is  the  L  X  L  identity  matrix.  It  will 
be  shown  that  the  step-size  parameter  that  minimizes  jE7[v^(n)v(7i)]  produces  the 
minimum  possible  steady-state  excess  mse,  that  is,  Jex(oo)  ==  Oj  so  any  differences 
that  arise  from  minimizing  the  weight  error  power  instead  of  the  excess  mse  only 
affect  the  transient  behavior  of  Jex(w). 

®If  the  reference  input  is  a  sequence  of  independent,  identically-distributed  random  variables,  as 
explicitly  assumed  by  Duttweiler  (1982),  then  R  =  cl,  where  c  is  an  arbitrary  consteint. 
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Similar  approaches  designed  to  minimize  the  mean-squared  weight  error  or  mean- 
squared  output  error  have  been  proposed  for  other  applications  of  the  LMS  algorithm 
(Duttweiler,  1982;  Sondhi  and  Berkley,  1980;  Mikhael,  1986;  Yassa,  1987).  These 
methods  optimize  the  steady-state  performance  when  the  system  is  operating  in  a 
stationary  environment,  that  is,  when  the  optimal  weights,  w*,  do  not  vary  with  time. 
They  do  not  attempt  to  optimize  performance  in  the  presence  of  nonstationaxities  or 
during  transients.  Because  the  system  is  continuously  adapting,  this  results  in  nearly 
optimal  performance  in  slowly  varying  nonstationary  environments,  providing  that 
the  degree  of  nonstationarity  is  slow  relative  to  the  convergence  time  of  the  adaptive 
filter. 

Optimal  step-size  parameters  based  on  error  criteria  that  include  performance 
during  transients  and  in  nonstationary  environments  axe  available  in  the  literature 
for  some  cases.  Examples  of  step-size  parameters  selected  to  optimize  performance 
during  transients  axe  presented  by  Horowitz  and  Senne  (1981),  Feuer  and  Weinstein 
(1985),  and  Hsia  (1983).  Horowitz  and  Senne  (1981)  select  the  step-size  parameter  to 
provide  “fastest  initial  convergence.”  Feuer  and  Weinstein  (1985)  derive  a  step-size 
parameter  that  minimizes  the  quantity 


C  =  5:{J(n)  -  JCoc)).  (4.33) 

n=0 

where  small  values  of  C  correspond  to  rapid  convergence.  Hsia  (1983)  minimizes  the 
convergence  ratio, 

tr[v^(n  -f-  l)v(n  -h  1)] 

tr[v2’(n)v(n)]  '  ^  ^ 

In  nonstationary  environments,  an  additional  source  of  error  arises  from  the  weight 
vector  lag,  that  is,  the  difference  between  the  current  weights,  w(n),  and  the  optimal 
weights,  w*(n),  due  to  changes  in  the  optimal  weights.  Optimized  step-size  param¬ 
eters  for  nonstationary  environments  are  presented  by  Widrow  et  al.  (1976),  Hsia 
(1983),  and  Gardner  (1987).  These  methods  minimize  the  total  weight  vector  error 
due  to  both  weight  vector  lag  and  misadjustment  from  the  noisy  gradient  estimate. 

Finally,  Fisher  and  Bershad  (1983)  and  Bershad  (1987)  advocate  selecting  the 
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step-size  parameter  to  provide  “the  smallest  misadjustment  error  at  the  end  of  the 
observation  interval,”  that  is,  number  of  iterations.  They  determine  the  optimal 
value  of  the  step-size  parameter  empirically  by  plotting  misadjustment  as  a  func¬ 
tion  of  step-size  parameter  for  a  variety  of  observation  intervals,  filter  lengths,  signal 
powers,  and  values  of  minimum  mse.  The  approach  taken  in  the  current  work  can 
be  considered  an  analytical  means  of  satisfying  Bershad’s  (1987)  criterion,  providing 
that  the  ‘observation  interval’  is  sufliciently  long  that  the  system  has  converged. 

Optimization  based  on  weight  error  power 

Derivation  of  the  optimal  step-size  parameter  based  on  minimizing  the  trace  of  the 
weight  error  correlation  matrix,  tr[K(7i)],  at  each  iteration,  first  requires  an  expression 
for  the  time  evolution  of  K(ti).  The  derivation  of  this  expression  is  omitted  here,  but 
can  be  found  in  Haykin  (1986,  pp.  221—225).  The  general  idea  is  to  use  (4.1),  (4.17), 
and  (4.24)  to  determine  the  time  evolution  of  the  weight  error  vector,  v(n),  and 
then  tahe  the  outer  product  of  both  sides  of  the  equation  according  to  (4.26).  With 
no  target  signal  present,  the  resulting  time  evolution  of  the  weight-error  correlation 
matrix  is 

K(7i  -1- 1)  =  K(n)  -  /i[RK(Ti)  +  K(7i)R]  +  ^"Rtr[RK(n)]  +  /xV„i„R  (4.35) 

(Haykin,  1986,  5.74).  Including  the  target  signal  and  following  the  steps  used  to 
derive  (4.35)  yields 

K(n  +  1)  =  K(n)  -  fi[KK{n)  -f  K(n)R]  -t-  /i='Rtr[RK(n)]  +  fi\J^  -|-  o-t^)R.  (4.36) 

Again,  as  discussed  following  (4.30),  the  effect  of  the  target  signal  is  the  same  as  the 
effect  of  the  minimum  error. 

Given  the  state  of  the  system  at  time  n,  minimization  of  the  expected  value  of 
the  total  weight  error  power  at  time  (n  -1- 1),  E[v^{n  -j-  l)v(n  -k  1)]  or  equivalently 
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tr[K(7i  +  1)],  begins  by  taking  the  trace  of  (4.36),  yielding 


tr[K(n  +  1)]  =  tr[K(7i)]  -  2fitT[RK{n)]  +  /x'*trR(tr[RK(7i)]  +  J,ni^  +  (rf)  (4.37) 


where  the  properties  tr[A  +  B]  =  tr[A]  +  tr[B]  and  tr[AB]  =  tr[B A]  have  been  used. 
The  value  of  fx  that  mininiizes  the  total  power  in  the  weight  error  at  each  iteration, 
denoted  n*{n),  is  found  by  taking  the  first  derivative  of  (4.37)  with  respect  to  fi, 
setting  it  equal  to  zero,  and  solving  for  fi,  resulting  in 


.(s _ tr[RKW] 

'  '  (trR)(lr[RK(»)]  +  +  .7?) 


(4.38) 


It  can  be  verified  that  this  is  in  fact  a  minimum,  because  the  second  derivative  is 
positive. 

Substituting  (4.10),  (4.27),  and  (4.28)  into  (4.38)  yields 


H*{n)  = 


J„(n) 


<7ex(^) 


L<Tl{af  +  J^  +  J„(n))  LcrlE[y^{n)] 


(4.39) 


Finally,  the  constant  a  is  introduced  in  (4.39)  to  facilitate  conaparison  with  other 
algorithms,  producing 


a*(n)  =  ^ 

^  ^  ^  Lal{al  +  Jnun  +  J«(n))  L<TlE[y\n)]  ’ 

This  constant  affects  the  convergence  of  the  adaptive  algorithm  with  optimized  step- 
size  parameter,  but  does  not  affect  the  steady-state  performance,  as  will  be  shown 
below.  Substituting  (4.19)  into  (4.40)  reveals  that 

Jcx{n)  represents  output  signal  power  that  potentially  coidd  be  cancelled,  but  remains 
because  the  weights  are  at  suboptimal  values.  Therefore,  (4.41)  can  be  interpreted 
as  stating  that  the  optimal  step-size  parameter  at  each  iteration  equals  the  traditional 
step-size  parameter  adjusted  hy  the  ratio  of  cancellable  output  signal  power  to  total 
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output  signal  power. 

For  the  optimal  method  of  calculating  the  step-size  parameter,  substituting  (4.40) 
for  71  =  oo  into  (4.30)  and  rearranging  terms  yields 

(2  -  a)  Jex(oo)(  Jex(oo)  +  Jmin  +  )  =  0.  (4.42) 

The  nonnegative  solution  of  (4.42)  is 


Jex(oo)  =  0.  (4.43) 

This  result  proves  that,  in  the  steady  state,  the  method  of  calculating  the  step-size 
parameter  described  by  (4.40)  truly  is  optimal. 

4.3.2  Modifications  to  the  optimal  step-size  parameter 

The  expression  for  the  optimal  step-size  parameter  given  by  (4.40)  cannot  be  im¬ 
plemented  in  a  real  system,  since  it  requires  exact  knowledge  either  of  Jex(«)  oi 
equivalently  of  the  autocorrelation  matrix,  R,  and  the  current  weight  error  correla¬ 
tion  matrix  K(7i).  None  of  these  quantities  is  known.  R  is  impUcitly  estimated  by 
the  LMS  algorithm,  and  K(7i)  can  be  computed  from  the  current  weights  only  if  the 
optimal  weights,  w*,  are  known.  Obviously,  if  the  optimal  weights  were  known,  there 
would  be  no  need  for  any  adaptive  algorithm.  The  dependence  on  these  quantities 
is  not  surprising,  however,  since  Jcx{n)  and  K(7i)  both  measure  the  deviation  of  the 
current  weights  from  the  optimal  weights,  and  intuitively,  the  “best”  size  step  to  take 
at  any  point  depends  on  the  magnitude  of  that  deviation. 

In  order  to  determine  an  expression  for  the  step-size  parameter  that  can  be  im¬ 
plemented  in  a  real  system,  additional  assumptions  are  required.  The  first  approach 
approximates  the  optimal  method  derived  in  the  previous  section  by  using  (4.40)  with 
an  estimate  for  the  excess  mse,  that  is, 

. _ aJex{n) _ _  aJcxin) 
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The  estimate  of  the  excess  mse  is  based  on  powers  of  signals  available  to  the  adaptive 
processor,  specifically, 

=  E[y\n)]  +  (4.45) 

where  is  the  power  of  the  primary  input  signal.  Implementing  this  method  in 
a  real  system  requires  estimating  the  power  of  three  signals,  the  primaxy  input,  the 
reference  input,  and  the  system  output,  to  produce  the  time- varying  estimates  (T^^{n), 
<r^(7i),  and  ar^{n).  The  estimate  of  the  primary  input  power  can  be  considered  the 

sum  of  time- varying  estimates  of  and  <r?,  while  the  estimate  of  the  system  output 

power  can  be  considered  the  sum  of  a  time- varying  estimate  of  ,  and  two  terms  due 
to  the  jammer,  Jmin  and  Mathematically, 

^pri(»)  =  <rlp{n)  -H  a-l^in)  (4.46) 

and 

al{n)  =  a-ly{n)  +  -f  (4.47) 

where  (r^y{n)  is  the  estimate  of  target  power  derived  from  the  system  output,  o'j  p(n) 
is  the  estimate  of  target  power  derived  from  the  primary  input,  and  c^pin)  is  the 
estimate  of  jammer  power  derived  from  the  primary  input.  Substituting  the  estimates 
given  by  (4.46)  and  (4.47)  into  (4.45)  and  rearranging  yields 

Jex{n)  =  Jex(n)  -F  Jnun  +  o-2(n)  -  £r|p(n)  -I-  <rly{n)  -  crlp{n).  (4.48) 

This  shows  that  Jcxi'i^)  is  a  good  estimate  of  Jcx{n)  when  the  minimum  mse,  Jmin,  is 
small  and  the  power  estimates  are  accurate,  that  is,  <Tl{n)  ~  n)  ^ 

(r^p{n).  In  a  real  system,  the  value  of  Jex(^i)  may  be  negative  due  to  fluctuations  in 
the  power  estimates.  When  this  occurs,  Jex(n)  can  be  replaced  with  zero  for  that 
iteration. 

To  determine  the  steady-state  excess  mse  associated  with  this  approximation-to- 
optimal  method,  it  is  necessary  to  make  some  assumptions  about  the  estimation  error. 
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The  estimation  error,  A,  is  defined  by 


A  =  Jex{n)  -  Jex(w)  (4.49) 

and  is  related  to  the  output  error  power  associated  with  the  weight  error  vector 
implicitly  defined  by  Jcxin).  In  order  to  simplify  the  following  analysis,  it  is  assumed 
that  A  is  a  positive  constant.  Substituting  (4.44)  and  (4.49)  at  n  =  oo  into  (4.30) 
and  rearranging  yields 

((2  -  a)  Jex(oo)  -  Q:A)(  Jex(c5o)  +  Jmin  +  O’*  )  =  0,  (4.50) 


with  the  positive  solution 


Jcx{oo)  = 


2  —  a 


(4.51) 


If  Jcxin)  is  exactly  equal  to  Jex(w),  then  this  method  is  equivalent  to  the  optimal 
method,  the  estimation  error.  A,  is  zero,  and  the  steady  state  performance  is  optimal, 
that  is,  Jex(oo)  =  0.  From  (4.48),  this  only  occurs  if  J^in  =  0,  <rl{n)  =  <rlp(n),  and 
y(^)  ~  ®rror  in  the  estimate  of  Jex(^i)  wiU  cause  the  steady  state  excess 

mse  to  be  nonzero  and  proportional  to  the  estimate  error,  A.  Defining  constants  c* 
and  Ct  to  indicate  the  fractional  error  in  the  estimates, 


(4.62) 


-  °'i.i.(”) 


(4.53) 


and  using  (4.48)  and  (4.49),  it  can  be  seen  that  the  estimation  error,  A,  is  composed 
of  three  quantities,  proportional  to  the  powers  Jminj  o'l,  and  , 


A  —  Jmm  "1"  • 


(4.54) 


Substituting  this  expression  for  A  into  (4.51)  shows  that  the  approximation-to-optimal 
method  described  by  (4.44)  and  (4.45)  results  in  a  steady  state  excess  mse  that  in- 
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creases  \?ith  target  signal  power. 

A  second  candidate  for  adjusting  the  step-size  parameter  can  be  obtained  by 
making  assumptions  about  the  weight  error  correlation  matrix.  If  the  individual 
weight  errors  are  independent,  identically-distributed  random  variables  with  variance 
c,®  then  K(n)  =  cl.  Using  this  assumption  and  (4.10)  in  (4.27)  gives 

Jex(n)  =  tr[RK(n)]  =  ar[R]  =  cLal,  (4.55) 


and  (4.40)  becomes 


/i(n)  = 


ac  ac 

o-t  +  Jmin  +  Jcx{n)  E[y\n)] ' 


(4.56) 


Since  c  and  a  are  both  constants  and  the  value  of  c  is  arbitrary,  it  can  be  eliminated 
without  loss  of  generality.  Furthermore,  to  simpUfy  comparisons  with  other  methods, 
it  wiU  be  useful  to  include  a  factor  of  yielding 


Oi  cc 

"  i(<.|  +  +  J„(n))  =  LE[s^(n)Y 

the  output  method  for  calctdating  the  step-size  parameter.  Note  that  the  quantity 
E[y^{n)]  can  be  estimated  from  the  output  of  the  system  and  that  no  assumptions 
have  been  made  about  the  autocorrelation  matrix,  R. 

Even  if  individual  weight  errors  are  independent,  identically- distributed  random 
variables  as  assumed  above,  their  variance  will  not  remain  constant  as  the  LMS 
algorithm  converges  in  response  to  new  inputs.  As  a  result,  the  output  method 
produces  a  steady-slate  excess  mse  that  is  nonzero,  but  independent  of  the  target 
power.  This  can  be  seen  by  substituting  (4.57)  for  n  =  oo  into  (4.30)  and  rearranging 
terms,  yielding 

(2  Jex(oo)  -  acrf )( Jex(oo)  -I-  Jnun  +  ol)  =  0.  (4.58) 

®The  entries  of  the  weight  error  correlation  matrix,  K(n),  are  dimensionless,  because  the  weights 
themselves  are  dimensionless. 
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The  positive  solution  to  (4.58)  is 


^ex(oo)  = 


(4.59) 


which  is  independent  of  the  target  power. 

The  final  method  proposed  for  calculating  the  step-size  parameter  is  obtained  by 
combining  the  advantages  of  the  traditional  and  output  methods.  Recall  that  for 
the  traditional  method,  the  step-size  parameter  is  normalized  by  the  reference  input 
power,  O’*,  producing  a  steady-state  excess  mse  that  is  proportional  to  the  target  signal 
power.  For  the  output  method,  the  step-size  parameter  is  normalized  by  the  system 
output  power,  E[y^{n)],  producing  a  steady-state  excess  mse  that  is  independent  of 
the  target  signal  power.  Therefore,  the  advantage  of  the  traditional  method  is  that 
the  steady-state  excess  mse  is  very  small  in  the  presence  of  weak  target  signals,  while 
the  advantage  of  the  output  method  is  that  the  steady-state  excess  mse  is  constant 
in  the  presence  of  strong  target  signals.  Both  of  these  advantages  can  be  obtained  by 
normalizing  by  the  sum  of  the  reference  input  and  system  output  powers,  according 
to 

G.  CK 

"  Lial  +  <T!  +  J^  +  J,^{n))  "  L{<Tl  +  E[y^{n)]y 
This  method  of  calculating  the  step-size  parameter  was  used  by  Greenberg  and  Zurek 
(1992). 

The  steady-state  performance  of  the  sum  method  can  be  determined  by  substi¬ 
tuting  (4.60)  for  n  =  oo  into  (4.30)  and  rearranging,  which  yields 

2( Jex(oo))^  -f  [2{J^  +  CT-f )  +  (2  -  a)«r^] Jex(oo)  -  aa-KJrnin  +  <Tt)  =  0.  (4.61) 

The  positive  solution  of  (4.61)  is 

,  .  ^[(2  -  a)(rl  +  2{J^  -h  (r^)Y  +  Saal{Jr^  -t-  af)  -  (2  -  a)(Tl  -  2( -f  Ct ) 

J.„(oo)  =  - - - 

(4.62) 
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traditional 

optimal 

optimal  approx 

output 

sum 

adaptive 
step- size 

fi  = 

H*{n)  = 

fi*{n)  = 

^out(n)  = 

fJ’SMmin)  = 

parameter 

a 

OtJcxM 

atJex(n) 

Ct 

ct 

£<r|£[y2(n)] 

L<rlEly^{n)] 

LE[y^{n)] 

Li^l+E[y^[n)]) 

convergence 

0  <  a  <2 

0  <  a  <  2 

0  <  a  <  2 
if  A< 

U'j  d"  Jmia 

0  <  a  <  2 

steady-state 

excess  mse 
Jex(oo) 

2-ct 

0 

aA 

2-cc 

OtCr^ 

2 

=  0 

_  a<r^ 

2 

Table  4.1:  Summary  of  results  for  five  methods  of  adjusting  the  step-size  parameter 
described  in  the  text. 


It  can  be  shown  that 


and 


lim  Jex(oo)  =  0 

(Jinin-|-o-|)-+0 


hm  Jex(cx)) 
)-»<» 


(4.63) 


(4.64) 


confirming  that  the  sum  method  has  the  advantages  of  both  the  traditional  and  output 
methods  at  the  two  extremes  of  taxget  signal  power. 

The  five  methods  of  calculating  the  step-size  parameter  presented  above  are  sum¬ 
marized  in  Table  4.1.  For  each  of  these  methods,  it  is  necessary  to  consider  the  limits 
on  a  required  for  the  adaptive  algorithm  to  converge.  From  Widrow  and  Steams 
(1985),  the  algorithm  converges  for  0  <  u  <  and  a  conservative  upper  bound 
can  be  found  by  replacing  the  maximum  eigenvalue,  Xmax,  with  the  sum  of  all  eigen¬ 
values,  Aj,  which,  from  (4.10),  equals  Lcrl.  Using  this  conservative  upper  bound, 
the  condition  required  for  convergence  is  0  <  ^  <  Equations  (4.19),  (4.40), 
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(4.44),  (4.57),  and  (4.60)  were  each  substituted  for  fi  in  this  inequality,  and  the  re¬ 
sulting  limits  on  the  dimensionless  step-size  parameter,  a,  axe  included  in  Table  4.1. 
The  limit  given  for  the  sum  method  represents  an  extremely  conservative  upper  bound 
based  on  the  assumption  that  E\y'^{n)]  =  0.  The  limit  given  for  the  output  method 
reveals  a  potential  problem,  since  the  upper  bound  on  a  will  be  very  small  when  the 
algorithm  has  converged  and  the  target  signal  is  weak.  In  a  practical  system  this 
could  be  overcome  by  selecting  a  assuming  a  minimum  power  level  for  E[y^{n)]  and 
then  substituting  the  minimum  value  into  (4.57)  whenever  that  minimum  exceeds  the 
current  estimate  of  £J[y^(n)]. 

4.3.3  Comparison  of  methods  for  calculating  the  step-size 
parameter. 

The  five  methods  described  above  for  calculating  the  step-size  parameter  (traditional, 
optimal,  approximation-to-optimal,  output,  and  sum)  are  compared  on  the  basis  of 
steady-state  performance  and  transient  behavior. 

Steady-state  performance 

The  measure  of  steady-state  performance  is  the  steady-state  excess  mse,  Jex(oo), 
which  reflects  the  error  due  to  the  ongoing  adaptive  process,  that  is,  the  fluctuation 
of  the  weights  about  their  optimal  values  after  they  have  converged  in  the  mean.  The 
steady  state  excess  mse  was  calculated  for  each  of  the  five  methods  in  the  previous 
sections,  and  the  results  are  included  in  Table  4.1. 

Figure  4.1  shows  the  steady-state  performance  for  all  but  the  optimal  method 

in  terms  of  the  jammer  gain  due  to  the  system  ( )  as  a  function  of  the  input 

2  ^ 

TJR  (^).  This  is  similar  to  the  normalized  residual  noise  used  by  Lu  and  Clarkson 

(1993).  Values  of  jammer  gain  less  than  unity  (0  dB)  indicate  beneficial  performance 

due  to  the  adaptive  noise  canceller,  while  values  of  the  jammer  gain  that  exceed  unity 

indicate  that  the  output  of  the  system  is  degraded  relative  to  the  input.  The  excess 

mse  for  the  different  methods  was  calculated  according  to  (4.31),  (4.43),  (4.51),  (4.59), 
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Steady-state  performance 


Figure  4-1:  Steady-state  performance  for  four  methods  of  computing  the  step-size  pa¬ 
rameter.  The  plot  shows  the  jammer  gain  due  to  the  system  as  a  function  of  the  input 
TJR.  The  four  methods  are  the  traditional  method,  the  approximation-to-optimal 
method,  the  output  method,  and  the  sum  method,  labeled  trad,  approx,  output,  and 
sum,  respectively.  The  curves  are  based  on  (4.31),  (4.51),  (4.59)  (4.62)  from  the  text. 

and  (4.62),  for  a  =  0.2,  L  =  10,  and  Jmin  =  0.  Figure  4.2  shows  the  steady-state 
performance  for  aU  five  methods  with  the  same  paxameter  values  as  Fig.  4.1,  except 
that  =  0.33.  For  both  figures,  the  estimation  error.  A,  in  (4.51)  was  computed 
according  to  (4.54),  with  c*  =  Cj,  =  0.05,  corresponding  to  5  percent  error  in  the 
power  estimates.  Results  for  the  optimal  method  axe  not  shown  in  Fig  4.1,  since, 
with  Jniin  =  0,  the  jammer  gain  is  zero  (— oo  dB). 

For  the  optimal  method  of  calculating  the  step-size  paxameter,  the  steady-state 
excess  mse  is  zero.  This  verifies  that,  at  least  in  the  steady-state,  the  method  derived 
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Steady-state  performance 


input  TJR  (dB) 

Figure  4-2;  Steady-state  performance  for  five  methods  of  computing  the  step-size 
parameter.  The  plot  shows  the  jammer  gain  due  to  the  system  as  a  function  of  the 
input  TJR.  The  five  methods  axe  the  traditional  method,  the  optimal  method,  the 
approximation-to-optimal  method,  the  output  method,  and  the  sum  method,  labeled 
trad,  opt,  approx,  output,  and  sum,  respectively.  The  curves  are  based  on  (4.31), 
(4.43),  (4.51),  (4.59)  (4.62)  from  the  text. 
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in  Section  4.3.1  truly  is  optimal.  However,  tte  optimal  method  requires  knowledge 
of  quantities  that  are  not  available  to  a  real  system.  Because  of  the  assumptions 
made  to  derive  the  approximation- to-optimal,  output,  and  sum  methods,  the  resrdting 
steady-state  performance  is  suboptimal,  with  nonzero  values  of  steady-state  excess 
mse.  However,  from  Figs.  4.1  and  4.2,  it  is  clear  that  all  three  of  these  methods 
are  preferable  to  the  traditional  method  in  the  presence  of  appreciable  target  signal 
power.  The  performance  of  the  sum  method  is  particularly  attractive,  since  excess 
jammer  power  both  approaches  zero  for  weak  target  signals  and  remains  limited  for 
strong  target  signals. 

Transient  behavior 

In  this  section,  expressions  are  derived  to  characterize  the  transient  behavior  of  the 
LMS  algorithm  with  the  five  different  methods  of  calculating  the  step-size  parame¬ 
ter.  First,  iterative  expressions  axe  determined  for  the  method  of  steepest  descent, 
whose  transient  behavior  is  considerably  easier  to  analyze  than  that  of  the  LMS  algo¬ 
rithm.  Then,  those  results  are  related  to  the  transient  behavior  of  the  LMS  algorithm. 
Finally,  additional  assumptions  are  used  to  determine  the  time  constants  of  simple 
exponential  decays  that  approximate  the  transient  behavior  of  the  sum  method  of 
calctdating  the  step-size  parameter. 

Expressions  for  the  transient  behavior  of  the  LMS  algorithm  with  a  constant 
(traditional)  step-size  parameter  are  available  in  the  literature  (e.g.  Haykin,  1986, 
Eq.  5.111),  but  these  expressions  are  quite  compHcated  and  do  not  lend  themselves  to 
intuitive  interpretations  of  convergence  time.  However,  under  independence  theory, 
the  ensemble  average  of  the  weights  (for  identical  initial  conditions  and  different  input 
sequences)  is  equivalent  to  the  weights  obtained  by  the  method  of  steepest  descent, 
which  uses  the  exact  gradient  at  each  iteration.  Therefore,  the  expressions  derived 
here  are  based  on  the  method  of  steepest  descent,  and  they  characterize  the  mean 
transient  behavior  of  the  weights  (Alexander,  1986)  and  provide  a  lower  bound  on 
the  transient  behavior  of  the  excess  mse  (Widrow  and  Stearns,  1985). 

The  transient  properties  of  the  steepest  descent  algorithm  are  analyzed  by  con- 
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sidering  the  decay  of  different  modes  associated  with  the  different  eigenvalues  and 
eigenvectors  of  R  as  in  Widrow  and  Stearns  (1985)  and  Haykin  (1986).  The  autocor¬ 
relation  matrix  is  decomposed  to 


R  =  QAQ^ 


(4.65)^ 


where  the  columns  of  Q  are  the  normalized  eigenvectors  of  R,  and  A  is  a  diagonal 
matrix  with  diagonal  elements  equal  to  A^,  the  eigenvalues  of  R.  Q  is  orthonormal, 
and  its  geometrical  interpretation  is  a  rotation  that  aligns  the  coordinate  space  with 
the  principle  axes  of  the  hyperparaboloid  representing  the  error  surface.  This  rotation 
can  be  applied  to  the  weight  error  vector  to  produce  a  rotated  weight  error  vector, 

u{n)  =  Q^v(7i),  (4.66) 


with  the  behavior  of  the  individual  modes  decoupled.  This  permits  the  description 
of  the  excess  mse  as 

3„(n)  =  i;  Ai(l  -  Af  ”l>'.(0)P  (4.67) 

i-X 

(Haykin,  1986,  Eq.  5.28)  where  1/^(0)  is  the  element  of  the  initial  rotated  error 
vector,  *^(0).  Because  the  method  of  steepest  descent  uses  the  exact  gradient,  the 
performance  is  characterized  by  the  weight  error  vector,  v(7i),  and  the  rotated  vector 
i/(n).  It  is  not  necessary  to  take  the  expected  value,  as  it  was  in  the  analysis  of  the 
LMS  algorithm. 

For  the  traditional  method  of  calculating  the  step-size  parameter,  (4.67)  represents 
a  sum  of  exponential  decays.  Each  mode  is  associated  with  its  own  time  constant, 

_  -1 
21n(l  — /fAi) 


(Haykin,  1986,  Eq.  5.30),  which  is  approximated  as 


1 

2/iAt 


(4.69) 
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for  fiXi  -C  1  (Haykin,  1986,  Eq.  5.31).  Substituting  (4.19)  into  (4.69)  yields 


(4.70) 


for  aXi  <C  L<rl.  Haykin  (1986,  p.  236)  points  out  that  although  small  eigenvalues  of 
R  lead  to  slowly  converging  terms  in  the  transient  component  of  Jcx{n),  these  small 
eigenvalues  also  correspond  to  modes  that  make  a  relatively  small  contribution  to 
Jex(n). 

The  transient  behavior  of  the  proposed  methods  for  calculating  the  step-size  pa¬ 
rameter  can  be  characterized  using  the  above  approach.  However,  since  fi  in  (4.67)  is 
replaced  by  the  time- varying  the  resulting  decay  is  not  necessarily  exponential. 
Further,  fi  in  (4.67)  may  depend  on  Jex{n),  so  that  the  form  of  the  decay  may  change 
as  the  power  of  the  excess  mse  changes  with  respect  to  other  power  levels.  Therefore, 
it  will  be  useful  to  consider  the  change  in  Jex(Ti)  at  each  iteration.  This  requires 
defining  the  components  of  Jexi^)  attributable  to  the  individual  modes,  Jex,iin),  so 
that 

J«c(™)  =  EJex».  (4.71) 

i=l 

With  this  definition  and  (4.67) 

J„(n  +  1)  =  E  +  1)  =  E(1  -  A‘(»^)^i)^  Jex,i(’l)-  (4.72) 

x=l  i=l 


Substituting  the  appropriate  expressions  for  fi{n)  into  (4.72),  it  is  possible  to 
obtain  a  recursive  formula  for  the  excess  mse  associated  with  each  of  the  proposed 
methods  of  calculating  the  step-size  parameter.  For  example,  for  the  sum  method, 
substituting  (4.60)  into  (4.72)  gives 


Jcx{n  +  1)  =  E  ‘^ex,»(^  + 1) 


i=l 


aXi 


L{o'l  +  «/ex(^)  +  Jn 


.  +  <r!) 


1 


J«.i(n).  (4.73) 


Similarly,  substituting  (4.40),  (4.44),  and  (4.57)  into  (4.72)  produces  iterative  for¬ 
mulas  for  the  excess  mse  associated  with  the  optimal,  approximation-to-optimal,  and 
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output  methods,  respectively.  Iterating  these  recursive  formulas  for  known  parameter 
values  produces  a  smooth  decay  curve  that  corresponds  to  the  transient  behavior  of 
the  steepest  descent  algorithm.  The  relation  to  the  transient  behavior  of  the  LMS 
algorithm  is  discussed  below. 

Figure  4.3  shows  the  transient  behavior  predicted  by  (4.72)  for  the  five  methods  of 
calculating  the  step-size  parameter.  The  three  parts  of  the  figure  show  the  transient 
behavior  for  input  TJRs  of  —10,  0,  and  -1-10  dB.  The  parameter  values  used  to 
generate  these  curves  were  a  =  0.2,  L  =  10,  Jmm  =  0,  £r|  =  aj,  Jex(O)  =  <t|,  and 
Aj  =  al  for  all  i. 

The  curves  shown  in  Fig.  4.3  describe  the  transient  behavior  of  the  steepest  descent 
algorithm  and  can  be  interpreted  to  provide  some  understanding  of  the  transient 
behavior  of  the  LMS  algorithm.  The  transient  performance  of  the  steepest  descent 
algorithm  corresponds  to  the  mean  transient  behavior  of  the  weights  and  a  lower 
bound  on  the  transient  behavior  of  the  excess  mse  for  the  LMS  algorithm.  Although 
the  curves  in  Fig.  4.3  all  converge  to  zero  (in  general,  to  ^^“•),  the  steady-state 
performance  of  the  LMS  algorithm  is  nonzero  for  all  but  the  optimal  method,  as  seen 
in  Fig.  4.1.  The  impHcation  of  Fig.  4.3  for  the  performance  of  the  LMS  algorithm 
is  not  the  value  to  which  the  curves  converge,  but  in  the  rate  of  convergence,  since 
the  transient  performance  of  the  steepest  descent  algorithm  is  the  same  as  the  mean 
behavior  of  the  LMS  weights.  For  example,  for  the  traditional  method  with  input 
TJR  =  — 10  dB,  the  jammer  gain  starts  at  unity  and  converges  to  a  steady-state 
value  near  0.01  (—20  dB).  The  ensemble  average  of  performance  curves  for  different 
input  samples  with  the  same  parameter  values  will  exhibit  transient  behavior  with  the 
shape  of  the  traditional  method  curve  in  the  top  panel  of  Fig.  4.3,  but  will  converge 
to  the  steady-state  value  of  0.01. 

With  this  understanding,  it  is  possible  to  make  several  observations  by  compar¬ 
ing  Figs.  4.1  and  4.3.  First,  in  general,  the  conditions  producing  poorer  steady-state 
performance  (the  output  method  at  low  TJR  and  the  traditional  method  at  high 
TJR)  converge  faster  that  those  that  produce  more  favorable  steady-state  perfor¬ 
mance.  This  is  a  direct  result  of  the  fundamental  tradeoff  inherent  in  selection  of 
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input  TJR  = -10  dB 


Figure  4-3:  Transient  behavior  for  five  methods  of  computing  the  step-size  parameter. 
The  curves  are  based  on  substituting  the  appropriate  expression  for  computing  the 
step-size  parameter  into  (4.72).  The  three  panels  show  the  behavior  for  input  TJRs 
of  —10,  0,  and  10  dB. 
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the  dimensionless  step-size  parameter,  a.  Second,  despite  the  sometimes  rapid  initial 
convergence  of  the  optimal  method,  it  always  exhibits  the  slowest  convergence  overall. 
This  is  because  of  the  presence  of  the  excess  mse  in  the  numerator  of  the  optimal  step- 
size  parameter  given  by  (4.40).  As  the  excess  mse  converges,  the  step-size  parameter 
is  gradually  reduced,  and  the  convergence  slows  as  it  progresses.  Finally,  for  a  prac¬ 
tical  system,  at  high  TJR  the  transient  behavior  is  similar  for  the  sum,  output,  and 
approximation-to-optimal  methods.  At  low  TJR,  the  sum  method  converges  slightly 
slower  than  the  traditional  and  approximation-to-optimal  methods,  but  that  disad¬ 
vantage  is  insignificant  compared  to  the  improved  steady-state  performance  obtained 
with  the  sum  method. 

Although  the  curves  in  Fig.  4.3  were  computed  for  the  case  of  equal  eigenvalues, 
similar  results  are  obtained  when  the  eigenvalues  are  not  equal.  Unequal  eigenvalues 
cause  the  different  modes  to  converge  at  different  rates,  and  the  overall  convergence 
is  the  sum  of  the  modes.  For  a  single  eigenvalue,  the  transient  behavior  of  the  corre¬ 
sponding  mode  is  of  the  form  of  Fig.  4.3  to  within  a  scaling  of  the  abscissa.  Therefore, 
the  transient  behavior  of  the  total  excess  mse  corresponds  to  the  sum  of  such  scaled 
curves,  and  the  relative  performance  for  the  different  methods  will  follow  the  trends 
shown  in  Fig.  4.3. 

Estimates  of  the  convergence  time  based  on  an  exponential  decay  can  be  obtained 
for  particular  situations  by  making  additional  assumptions.  This  will  be  demon¬ 
strated  in  the  following  derivation  for  the  sum  method,  the  practical  method  with 
the  most  promising  steady-state  performance.  Similar  derivations  can  be  performed 
to  gain  insight  into  the  transient  performance  of  the  other  methods  under  particTilar 
conditions. 

The  relation  described  by  (4.73)  does  not  represent  a  sum  of  simple  exponential 
decays,  due  to  the  presence  of  the  quantity  Jex(ji)  in  the  denominator.  If  the  excess 
mse  is  small  relative  to  the  power  of  the  other  signals  in  the  normalization 

+  Jmin),  then  the  effect  of  J^x{n)  in  the  denominator  of  (4.73)  is  negligible 
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and  the  decay  associated  with  each  mode  will  be  exponential  with  time  constant 


Ti 


+  g't  +  /min) 
2aAt 


(4.74) 


for  aXi  <  L{al  4-  (r(  +  Jmir.)-  On  the  other  hand,  if  the  excess  mse  is  large  relative  to 
the  uncanceUable  signal  powers  {Jcx{n)  ^  then  (4.73)  becomes 


Jcxin  +  1)  =  XI  +  1) 


i=l 


s(i- 


aXi 


L(<rl  +  J„(n)) 


) 


(4.75) 


and  the  effect  of  Jex{n)  in  the  denominator  cannot  be  ignored.  At  the  beginning  of  the 
adaptive  process,  the  initial  excess  mse  can  be  approximated  by  the  primary  jammer 
power,  which  is  roughly  equal  to  the  power  of  the  reference  input  ( Jex(O)  ~  tr?  »  cr^). 
Substituting  al  for  Jex(n)  in  the  last  term  of  (4.75)  reveals  that  the  excess  mse 
associated  with  each  mode  initially  decays  with  a  time  constant  given  by 


aXi 


(4.76) 


for  aXi  <C  As  the  excess  mse  decays  from  its  initial  value,  Jex{'n>)  becomes 

smaller  than  <r^,  the  second  term  in  the  denominator  of  (4.73)  decreases,  and  the 
excess  mse  decays  more  rapidly  than  indicated  by  (4.76).  Therefore  (4.76)  provides 
a  conservative  estimate  of  the  time  constant. 

Figure  4.4  shows  the  transient  performance  for  the  sum  method,  computed  from 
the  recursive  formula  given  in  (4.73),  together  with  decaying  exponentials  with  time 
constants  given  by  (4.74)  and  (4.76).  The  parameter  values  are  the  same  as  those  used 
to  generate  Fig.  4.3.  As  expected,  when  the  TJR  is  high  (bottom  panel  of  Fig.  4.4), 
the  transient  predicted  by  the  recursive  formula  closely  matches  the  exponential  with 
decay  given  by  (4.74).  When  the  TJR  is  low  (top  panel  of  Fig.  4.4),  the  transient 
predicted  by  the  recursive  formula  initially  tracks  the  exponential  with  decay  given 

'^It  is  not  reasonable  to  assume  that  the  excess  mse  is  very  much  greater  than  the  reference 
input  power  {Jex{n)  <r|).  It  is  assumed  that  the  reference  input  power  (cr|)  is  comparable  to  the 
primary  jammer  input  power  (o^y),  therefore  this  situation  could  only  occur  if  the  system  provided 
considerable  amplification  of  the  jammer  signal. 
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by  (4.76).  As  the  system  converges,  the  transient  predicted  by  the  recursive  formula 
converges  faster  than  the  exponential,  as  expected.  Eventually,  the  excess  mse  con¬ 
verges  to  a  level  sufficiently  low  that  the  assumptions  used  to  derive  (4.76)  are  invalid, 
and  at  some  point  the  assumptions  used  to  derive  (4.74)  become  valid.  For  this  case 
of  equal  eigenvalues  used  to  generate  Fig.  4.4,  the  time  constants  given  by  (4.74)  and 
(4.76)  aje  equivalent  when  the  reference  signal  power  equals  the  sum  of  the  target 
signal  and  the  excess  mse  (<7^  =  al  Jmin).  For  the  parameter  values  used  here,  this 
occurs  at  TJR  =  0  dB  (middle  panel  of  Fig.  4.4).  In  that  case,  the  two  exponential 
curves  coincide,  and  the  performance  predicted  by  (4.73)  does  not  differ  substantially. 


4.4  Simulations 

To  verify  the  results  derived  in  the  previous  section,  an  adaptive  noise  canceller  was 
implemented  in  computer  simulations  with  five  methods  of  adjusting  the  step-size 
parameter.  The  five  methods  were  based  on  the  traditional  (/ttrad)j  optimal  (/^*), 
approximation-to-optimal  (/t*),  output  (/iout))  and  sum  {fisxan)  methods  described  by 
equations  (4.19),  (4.40),  (4.44),  (4.57),  and  (4.60),  respectively.  Implementation  of 
the  optimal  method  requires  knowledge  of  the  optimal  values  for  the  adaptive  weights, 
w*,  which  is  possible  in  a  computer  simulation  but  not  in  a  real  system.  The  results 
for  the  optimal  method  are  presented  as  a  benchmark  of  the  performance  and  to 
confirm  the  analysis  of  the  previous  section.  The  other  four  methods  rely  only  on 
quantities  available  to  the  adaptive  processor  and  therefore  can  be  implemented  in  a 
practical  system. 

The  target  and  jammer  signals  were  generated  from  two  mutually  independent, 
normally  distributed  noise  sources,  t{n)  and  z{n),  with  zero  mean  and  unit  variance. 
The  primary  and  reference  jammer  signals,  j[n]  and  x[n]^  were  generated  from  z{ri) 
for  two  different  cases.  In  Case  1,  the  jammer  signal  in  the  primary  input  was  a 
delayed  version  of  the  jammer  signal  in  the  reference  input,  that  is  j{n)  =  z{n  — 
2)  and  x{n)  =  z(n).  In  Case  2,  the  jammer  signals  in  the  primary  and  reference 
inputs  were  generated  from  the  sum  and  difference  of  a  delayed  signal  according  to 
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input  TJR  =  -1 0  dB 


input  TJR  =  0  dB 


input  TJR  =  10  dB 


Figure  4-4:  Transient  behavior  for  the  sum  method  of  computing  the  step-size  pa¬ 
rameter.  The  solid  curve  is  the  true  transient  performance  according  to  (4.73)  and 
the  dashed  and  dotted  curves  are  exponential  decays  that  approximate  the  transient 
behavior  with  time  constants  given  by  (4.74)  and  (4.76).  The  three  panels  show  the 
behavior  for  input  TJRs  of  —10,  0,  and  10  dB. 
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iW  =  ~  2]  +  z[n])  and  a;[7i]  =  ^{z[n  —  2]  —  The  length  of  the  adaptive 

filter  was  L  =  10  and  the  normalized  step-size  paxameter  was  a  =  0.2.  With  these 
parameter  values,  in  Case  1,  the  minimum  mse  is  Jmir,  =  0,  and  the  eigenvalues  of 
the  autocorrelation  matrix,  R,  are  Aj  =  <t^  for  all  i.  In  Case  2,  the  minimum  mse  is 
Jmm  =  0.33t7^,  and  the  minimum  and  maximum  eigenvalues  are  Amin  =  0.13£7-^  and 
Amax  =  1-der^.  For  both  Cases  1  and  2,  the  target  signal  was  scaled  to  produce  values 
of  input  TJR  from  —20  dB  to  -|-20  dB  in  increments  of  10  dB.  For  each  trial,  the 
adaptive  weights  were  initialized  to  the  zero  vector. 

The  required  signal  powers  were  calculated  as  follows.  For  the  optimal  method, 
the  excess  mse,  Jex{f^),  was  replaced  by  its  instantaneous  value,  [v^(n)x(7i)]^  in  (4.40), 
and  E[y^{n)]  was  computed  according  to  (4.28),  by  summing  the  known  values  of 
and  with  the  instantaneous  value  [v^(7i)x(n)]*  instead  of  Jcx(n).  For  the  other 
methods,  the  signal  powers  er^,  and  E[y^{n)]  were  estimated  by  squaring  and 
then  filtering  the  reference  input,  primary  input,  and  system  output  with  a  first- 
order  recursive  lowpass  filter.  For  the  approximation-to-optimal  method,  the  time 
constant  of  the  lowpass  filter  was  100  samples.  For  the  sum  and  output  methods,  the 
time  constant  was  equal  to  the  filter  length,  L  =  10. 

Figures  4.5  and  4.6  show  the  simulation  results  (represented  by  discrete  points)  in 
terms  of  jammer  gain  as  a  function  of  input  TJR,  for  Cases  1  and  2,  respectively.  The 
results  were  obtained  by  processing  300,000-sample  source  signals®  and  using  the  last 
10,000  samples  of  the  output  to  compute  the  steady-state  error,  J(oo),  for  each  trial. 
For  each  condition,  ten  trials  were  performed  with  different  samples  of  the  target  and 
jammer  source  signals,  and  the  resulting  steady-state  error  values  were  averaged  over 
the  ensemble  of  ten  trials. 

The  smooth  curves  in  Figs.  4.5  and  4.6  indicate  the  steady-state  performance 
predicted  by  (4.31),  (4.43),  (4.51),  (4.59),  and  (4.62),  and  are  identical  to  the  curves 

®Tliis  relation  between  the  primary  and  reference  signals  occurs  when  a  simple  two-microphone 
generalized  sidelobe  canceller  is  used  to  cancel  a  directional  jammer  signal  that  has  a  delay  of  two 
sampling  periods  between  microphones. 

®This  length  was  selected  to  allow  the  optimal  method  to  approach  steady-state  for  the  condition 
with  the  longest  convergence  time  (Case  2  with  input  TJR  =  20  dB). 
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Steady-state  performance  -  Case  1 


input  TJR  (dB) 

Figure  4-5:  Steady  state  performance  for  four  methods  of  computing  the  step-size  pa¬ 
rameter  for  Case  1.  The  plot  shows  the  jammer  gain  due  to  the  system  as  a  function 
of  the  input  TJR.  The  four  methods  are  the  traditional  method,  the  approxima- 
tion-to-optimal  method,  the  output  method,  and  the  sum  method.  The  simulation 
results  are  represented  by  discrete  points,  while  the  solid  curves  correspond  to  the 
analytical  results  shown  in  Fig.  4.1. 
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Steady-state  performance  -  Case  2 


Figure  4-6:  Steady  state  performance  for  five  methods  of  computing  the  step-size 
parameter  for  Case  2.  The  plot  shows  the  jammer  gain  due  to  the  system  as  a 
function  of  the  input  TJR.  The  five  methods  are  the  traditional  method,  the  optimal 
method,  the  approximation-to-optimal  method,  the  output  method,  and  the  sum 
method.  The  simulation  results  are  represented  by  discrete  points,  while  the  solid 
curves  correspond  to  the  analytical  results  shown  in  Fig.  4.2. 
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shown  in  Figs.  4.1  and  4.2.  Again,  values  of  c*  =  Q  =  0.05  were  used  to  calculate 
the  estimation  error,  A,  according  to  (4.54)  for  use  in  (4.51).  Clearly,  in  steady  state 
there  is  very  good  agreement  between  the  simulations  and  the  analytical  results. 
Simulations  of  the  optimal  method  for  the  parameter  values  used  in  Fig.  4.5  (not 
shown)  confirmed  that  the  steady-state  performance  converges  to  zero  (— oo  dB). 

Figure  4.7  shows  the  transient  behavior  of  the  sum  method  for  Case  1.  The 
sohd  lines  are  the  ensemble  average  of  50  simulation  trials,  and  the  dashed  lines  are 
exponential  decays  with  time  constants  predicted  by  the  analysis  of  Sec.  4.3.3.  The 
dashed  Hues  were  generated  assuming  that  the  mse  decays  exponentially  to  the  value 
of  J(oo)  predicted  by  (4.62),  according  to 

J{n)  =  J(oo)  +  [J(0)  -  J(oo)]e-‘/’'  (4.77) 

This  is  in  contrast  to  the  transient  behavior  of  the  steepest  descent  algorithm  shown 
in  Figs.  4.3  and  4.4,  where  the  mse  decays  to  zero.  For  an  input  TJR  of  —10  dB  (top 
panel  of  Fig.  4.7)  the  time  constant  is  r  =  50  samples,  computed  according  to  (4.76), 
and  for  an  input  TJR  of  10  dB  (bottom  panel  of  Fig.  4.7)  the  time  constant  is  r  =  275 
samples,  computed  according  to  (4.74).  As  expected,  at  low  TJR,  the  exponential 
decay  predicted  by  (4.76)  gives  a  conservative  estimate  of  the  time  constant;  the 
actual  convergence  is  slightly  faster.  At  high  TJR,  the  exponential  decay  predicted 
by  (4.74)  closely  matches  the  simulation  results.  Overall,  there  is  good  agreement 
between  the  simulations  and  the  analytical  results. 

4.5  Discussion 

4.5.1  Summary  of  results 

The  preceding  analysis  determined  an  expression  for  the  optimal  time-varying  step- 
size  parameter  for  an  adaptive  noise  canceller  using  the  LMS  algorithm  to  adjust  the 
adaptive  weights.  The  optimality  criterion  used  was  minimization  of  the  total  weight 
error  power.  When  the  algorithm  reaches  steady-state,  this  criterion  is  equivalent  to 
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input  TJR  =  -10  dB 


iteration 


Figure  4-7:  Transient  betavior  for  the  sum  method  of  computing  the  step-size  param¬ 
eter  for  Case  1.  The  solid  curve  is  the  ensemble  average  from  computer  simulations, 
and  the  curve  composed  of  circles  is  an  exponential  decay  described  by  (4.77).  The 
two  panels  show  the  behavior  for  input  TJRs  of  —10  and  10  dB. 


70 


minimization  of  the  excess  mse  due  to  the  adaptive  process.  The  resulting  optimal 
method  requires  knowledge  of  quantities  unavailable  to  the  adaptive  system.  For 
implementation  in  a  practical  system,  additional  assumptions  lead  to  three  practical 
methods  (approximation-to-optimal,  output,  and  sum)  for  calculating  the  step-size 
parameter. 

In  terms  of  steady-state  performance,  the  sum  method  is  clearly  preferable  to  the 
traditional,  approximation-to-optimal,  and  output  methods.  The  performance  of  the 
traditional  method  suffers  in  the  presence  of  strong  target  signals  because  the  excess 
mse  is  proportional  to  target  power.  The  approximation-to-optimal  method  performs 
poorly  at  the  extremes  of  target  power  due  to  its  sensitivity  to  errors  in  the  signal 
power  estimates.  The  output  method  has  excess  mse  independent  of  target  power, 
which  is  unnecessarily  high  when  the  target  signal  is  weak.  The  sum  method  provides 
the  advantages  of  the  traditional  method  in  the  presence  of  weak  targets  (excess  mse 
proportional  to  the  sum  of  the  target  power  and  the  minimum  mse)  and  of  the  output 
method  in  the  presence  of  strong  targets  (constant  excess  mse). 

Analysis  of  all  five  methods  produced  recursive  formulas  to  characterize  the  tran¬ 
sient  behavior  for  known  parameter  values.  It  was  shown  that  although  methods 
that  produce  poor  steady-state  performance  at  the  extremes  of  TJR  converge  more 
quickly,  under  conditions  leading  to  good  steady-state  performance,  the  practical 
methods  all  exhibit  sinulax  transient  behavior.  Furthermore,  a  set  of  exponential  de¬ 
cays  were  derived  that  can  be  appUed  to  approximate  the  transient  behavior  of  the 
sum  method. 

The  surprisingly  poor  steady-state  performance  of  the  approximation-to-optimal 
method  deserves  some  discussion.  The  problem  with  implementing  the  approximation- 
to-optimal  method  Ues  in  the  difficulty  in  obtaining  good  power  estimates.  The  other 
methods  are  not  as  sensitive  to  errors  in  the  power  estimates  because  the  estimates 
are  only  used  to  normalize  the  step-size  parameter  with  respect  to  power  levels.  In 
the  approximation-to-optimal  method,  the  power  estimates  are  subtracted  with  the 
goal  of  cancelling  the  target  and  input  jammer  powers  and  obtaining  the  excess  error 
power.  However,  in  most  cases  the  powers  to  be  cancelled  by  the  subtraction  axe  much 
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laxger  than  the  desired  excess  error  power,  and  as  a  result,  relatively  laxge  errors  in 
the  estimate  of  excess  error  power  aie  Hkely.  For  the  simulation  results  presented  in 
Figs.  4.5  and  4.6,  the  time  constant  of  the  filter  used  to  produce  the  power  estimates 
was  100  samples.  Additional  simulations  showed  that  using  longer  time  constants 
did  provide  some  improvement,  but  the  performance  was  still  proportional  to  target 
power. 

In  addition,  the  agreement  between  simTilation  and  analytical  results  for  the 
approximation-to-optimal  method  was  not  as  good  as  for  the  other  methods.  That 
can  be  attributed  to  the  restrictive  assumptions  about  the  form  of  the  estimation  er¬ 
ror,  A,  that  were  required  to  simpHfy  the  analysis.  Even  so,  the  results  of  the  analysis 
based  on  A  successfully  predict  the  basic  trends  of  the  simulation  results. 

4.5.2  Relation  to  other  results 

Some  of  the  expressions  derived  above  can  be  related  to  previous  work  concerned 
with  the  performance  of  adaptive  echo-canceUers  for  the  telephone  network  (Dut- 
tweiler,  1982;  Sondhi  and  Berkley,  1980;  Wehrmann  et  al.,  1980;  Hoge,  1975).  AU 
of  these  works  arrived  at  expressions  for  the  optimal  step-size  parameter  analogous 
to  (4.39),  and  Wehrmann  et  al.  (1980)  and  Hoge  (1975)  considered  ways  of  making 
approximations  for  practical  implementation  in  a  real  system. 

Duttweder  was  primarily  concerned  with  the  use  of  nonlinearities  in  the  correlation 
multipher,  that  is,  replacing  the  product  y(n)x(n)  in  (4.17)  with  the  product  of 
arbitrary  functions  of  y{n)  and  x(n).  A  common  variation  on  the  LMS  algorithm  is 
to  use  the  sign  of  y{n),  x(n),  or  both,  in  order  to  ehminate  the  need  for  multiplication 
in  (4.17).  Duttweiler’s  analysis  showed  that  nonhnearities  always  impair  performance 
relative  to  use  of  the  true  correlation  multipher. 

To  make  the  analysis  tractable,  Duttweiler  made  several  assumptions  similar  to 
those  of  independence  theory.  Furthermore,  Duttweiler  assumed  that  the  adaptive 
filter  is  long  enough  to  completely  model  the  echo  path,  so  that  Jniin  =  0,  and  that 
samples  of  the  input  signal,  x{n),  are  independent,  so  that  R  =  I.  Under  these  as¬ 
sumptions,  Duttweiler  derived  an  expression  for  the  evolution  of  the  expected  value  of 
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the  weight  error  power  (equivalent  to  tr[K(n)])  for  arbitrary  nonhnearities.  He  eval¬ 
uated  that  expression  for  the  true  multiplier,  y(n)x(n),  ajid  for  several  nonhnearities, 
and  then  determined  the  optimal  step-size  parameter  for  each  of  those  cases. 

Three  expressions  from  Duttweiler’s  analysis  of  the  true  correlation  multipher  are 
directly  related  to  expressions  given  in  the  current  work.  Making  allowances  for  the 
different  assumptions,  Eqs.  48  and  50  from  Duttweiler  are  closely  related  to  (4.37) 
and  (4.30),  describing  the  evolution  of  the  weight  error  power  and  the  steady-state 
weight  error  power,  respectively.  In  Eq.  85,  Duttweiler  gives  the  optimal  step-size 
parameter  for  the  true  correlation  multipher,  which,  using  the  notation  of  the  current 
work,  is 
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(4.78) 


where  is  the  variance  of  uncanceUed  jammer  signal,  e^xin)  =  v^(7i.)x(n),  and  CTy 
is  the  variance  of  the  system  output.  Since  Jex(w)  =  comparing  (4.78)  to 

(4.39)  reveals  that  the  two  expressions  are  identical  except  for  the  use  of  variances 
in  place  of  expected  values  based  on  ensemble  averages.  As  is  true  of  (4.39),  (4.78) 
cannot  be  implemented  in  a  real  system,  because  is  unknown.  Duttweiler  points 
out  that  near-end  speech  detectors  used  on  the  telephone  network  can  be  seen  as  a 
crude  approximation  to  (4.78). 

In  a  review  paper  on  echo  canceUation  for  the  telephone  network,  Sondhi  and 
Berkley  (1980)  describe  a  similar  result.  Again,  using  assumptions  similar  to  inde¬ 
pendence  theory,  they  derive  Eq.  34  to  describe  the  evolution  of  the  weight  error 
power  that  is  equivalent  to  (4.37)  presented  in  this  work.  In  Eq.  35,  they  present  an 
expression  for  the  step-size  parameter  that  minimizes  the  weight  error  power,  which 
is  presumably  obtained  by  taking  the  first  derivative  of  Eq.  34  with  respect  to  the 
step-size  parameter.  Correcting  what  appears  to  be  a  typographical  error  (lack  of 
squaring  the  term  in  the  numerator)  and  using  the  notation  of  the  current  work, 
their  Eq.  35  is 


ti*{n)  = 


[v^(n)x(7i)]^ 


(4.79) 


X^(7i)x(n)(<r|  -f-  Jmin  +  [v^(w)x(n)]2)  ■ 

Noting  that  E[(v^(n)x(n))^]  =  Jex{n)  and  that  E[x^(n)x(7i)]  =  Lcrl  reveals  that 
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(4.79)  is  equivalent  to  (4.39)  except  for  the  use  of  instantaneous  values  instead  of 
expectations  based  on  ensemble  averages.  With  regard  to  (4.79),  Sondhi  and  Berkley 
state  that  “this  ‘optimum’  value  can  be  estimated  by  assuming  the  a:’s  to  be  i.i.d., 
and  making  some  further  simplifying  assumptions.”  However,  they  do  not  explain 
what  the  simplif3dng  assumptions  are  or  how  an  approximation  to  (4.79)  could  be 
implemented  in  a  real  system. 

The  approach  described  by  Sondhi  and  Berkley  (1980)  is  based  on  work  by  Hoge 
(1975).  Using  the  notation  of  the  current  work,  Hoge  (1975)  reports  that  the  optimal 
step-size  parameter  is  given  by 


Mn)-  = - ^  (4.80) 

where  a  =  1  and  o-j  (n)  and  tr[K(n)]  aie  recursive  estimates  of  the  target  signal  power 
and  the  weight  error  power,  respectively.  Rearranging  (4.80)  yields 


aJcxjn) 
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(4.81) 


where  the  estimates  Jcx{‘’^)  =  <^x^r[K(n)]  and  E[y^{n)]  =  (74(71) -(-^x^(n)x(n)tr[K(TO)] 
aie  based  on  (4.28)  and  the  assumptions  that  Jnun  =  0  and  R  =  <7^1.  As  a  result 
of  the  latter  assumption  and  (4.27),  Jex{n)  =  cr^trK(7i).  Comparing  (4.40)  to  (4.81) 
reveals  that  the  two  expressions  are  the  same  if  the  estimates  are  replaced  by  their 
exact  values.  Hoge  proposes  estimating  ctj  and  tr[K(7i)]  recursively  from  quantities 
available  to  the  adaptive  processor  and  from  initial  estimates  of  those  values  at  ti  =  0. 

Wehrmann  et  al.  (1980)  report  that  the  optimal  time- varying  step-size  parameter, 
using  the  notation  of  the  current  work,  is 
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K  Jmin  =  0,  using  (4.28)  in  (4.82)  and  reaxranging  shows  that 

*/  •v  _  _ «^ex(^) _ 

^  ^  ■  x2'(n)x(n)£[!,*(»))’ 

which  is  the  same  as  (4.39)  with  Lai  estimated  by  x^(7i)x(n)  at  each  iteration. 

Wehrmann  et  al.  (1980)  note  that  the  optimal  method  cannot  be  implemented  in 
a  real  system  because  the  signals  e(n)  and  t{n)  are  not  available  for  calculating  their 
expected  values  required  in  (4.82).  They  then  propose  an  implementable  method  for 
calculating  the  “noise  insensitive  compromise  (nic)  step-size  factor,”  which  is  analyzed 
in  the  next  section. 


4.5.3  Application  to  other  proposed  modifications  of  the 
LMS  algorithm 

The  LMS  algorithm  is  widely  used  because  of  its  effectiveness,  simphcity,  and  relative 
ease  of  implementation.  It  is  therefore  not  surprising  that  many  researchers  have 
suggested  modifications  to  the  LMS  algorithm.  These  modifications  axe  intended 
for  a  variety  of  purposes,  including  reducing  computational  burden  (Claasen  and 
Mecklenbrauker,  1981;  Mathews  and  Cho,  1987;  Sullivan,  1993)  and  improving  the 
fundamental  tradeoff  between  convergence  time  and  steady-state  performance  (Harris, 
et  al.,  1986;  Kaxni  and  Zeng,  1989;  Yasukawa  and  Shimada,  1993;  Makino  et  al.,  1993). 
Some  of  these  modified  LMS  algorithms  are  based  on  rigorous  analysis,  while  others 
are  ad  hoc.  Furthermore,  some  of  these  methods  axe  only  effective  in  applications  of 
the  LMS  algorithm  where  the  target  signal  is  weak  or  nonexistent. 

The  purpose  of  this  section  is  to  demonstrate  how  the  methods  of  Sec.  4.3  can 
be  appUed  to  evaluate  many  of  these  proposed  algorithms.  Particular  attention  is 
given  to  the  apphcabihty  of  these  modifications  to  configurations  that  require  the 
algorithm  to  perform  in  the  presence  of  strong  target  signals,  such  as  the  adaptive 
noise  canceller.  It  should  be  noted,  however,  that  not  all  modifications  to  the  LMS 
algorithm  can  be  evaluated  using  these  methods.  In  order  to  analyze  an  algorithm 
using  this  framework,  the  modification  must  be  expressable  in  terms  of  a  time- varying 
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step-size  parameter,  fi{n),  that  is  applied  to  the  true  correlation  multiplier,  y{n)x.(n), 
in  (4.17).  Analysis  of  arbitrary  nonlinear  correlation  multipliers  is  considerably  more 
complicated  (e.g.,  Duttweiler,  1982;  Sullivan,  1993). 

The  following  three  examples  illustrate  how  the  analysis  performed  in  Sec.  4.3 
can  be  used  to  evaluate  the  noise  insensitive  compronaise  (nic)  algorithm  (Wehrmann 
et  al.,  1980),  the  signed-error  LMS  algorithm  (Claasen  and  Mecklenbrauker,  1981; 
Mathews  and  Cho,  1987)  and  the  LMS  algorithm  with  adaptive  damped  convergence 
factor  (Kami  and  Zeng,  1989). 


Noise  insensitive  compromise  (nic)  algorithm 

Wehrmann  et  al.  (1980)  propose  the  noise  insensitive  step-size  factor,  given  by 


fiidcin)  - 


a 

A-n.axEf=lN(«-i)r 


(4.84) 


where  ATmax  is  the  maximum  peak  voltage  of  telephone  speech.  They  provide  simula¬ 
tion  results  using  a  —  2  and  show  that  it  performs  better  than  the  traditional  method 
when  the  target  signal  is  present.  Their  simtdation  results  are  based  on  performance 
determined  from  a  200  ms  segment  taken  after  the  system  has  adapted  for  2  seconds, 
and  they  do  not  analyze  the  convergence  time. 

As  is  typical  of  telephone  echo  cancellers,  they  assume  that  the  system  will  include 
a  speech  detector  to  suspend  adaptation  when  the  target  signal,  t{n),  consists  of  near 
end  speech.  As  a  result,  when  the  system  is  adapting,  the  target  signal  consists  only 
of  line  noise,  and  their  simulations  are  restricted  to  values  of  TJR  less  than  —20  dB. 
However,  it  is  interesting  to  consider  the  performance  of  this  method  at  high  TJR  as 
well. 

The  steady-state  performance  of  the  nic  method  can  be  determined  by  substituting 
(4.84)  for  fi  in  (4.30); 
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(4.85) 
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Ifthe  values  of  x{n)  are  zero-mean  Gaussian  random  variables,  then  ^J[|®(n)|]  =  crx\f^ 
(Mathews  and  Cho,  1987).  Using  the  Gaussian  assumption,  replacing  \x{n  —  i)|  in 
(4.85)  with  its  expected  value  and  defining  B  =  gives 


•7ex(oo) 
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(4.86) 


Comparing  this  expression  to  (4.31)  shows  that  the  steady  state  performance  of  the  nic 
method  is  similar  to  the  performance  of  the  traditional  method  with  the  dimensionless 
step-size  paxameter,  a,  reduced  by  a  factor,  B,  based  on  the  ratio  of  actual  reference 
input  rms  level  to  the  maximum  possible  peak  input. 

For  the  nic  method,  the  performance  depends  on  both  the  TJR  and  the  signal 
levels  relative  to  X^ax-  For  example,  the  nic  method  will  provide  better  performance 
when  <tI  =  <r^  =  {B  =  |)  than  when  £r|  =  {B  =  |)  even  though 

TJR  =  0  dB  for  both  cases.  This  is  in  contrast  to  the  performance  of  the  methods 
proposed  in  Sec.  4.3,  which  only  depend  on  the  input  TJR.  For  the  nic  method,  the 
worst  case  steady  state  performance  occurs  when  the  ratio,  B,  is  largest,  that  is,  when 
the  input  levels  approach  Xmax-  Since  the  primary  input  must  be  less  than  X^ax  and 
it  is  assumed  that  =  cr?,  the  worst  case  occurs  when  crl  +  erf  = 

The  worst  case  performance  can  be  examined  at  the  two  extremes  of  TJR.  At  low 
TJR  (<7^  >>  af),  B  1  and  (4.86)  becomes 


Jex(oo) 


(4.87) 


which  is  similar  to  the  performance  of  the  traditional  algorithm  described  by  (4.31). 
At  high  TJR  (erf  erf),  B  fa  ^  and  (4.86)  becomes 
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where  terms  have  been  neglected  based  on  the  observations  that  af  Jmin  and 
of  ^  a.  At  high  TJR,  steady  state  performance  is  proportional  to  the  square  root  of 
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the  target  power,  which  is  considerably  better  than  the  steady  state  performance  of 

the  traditional  method.  However,  the  performance  is  not  as  good  as  that  of  the  sum 

2 

method,  which  approaches  ^  at  high  TJR. 

The  above  analysis  is  based  on  the  worst  case  scenario  for  the  signal  levels  and 
Xmax-  Even  though  the  performance  at  high  TJR  is  proportional  to  the  square  root 
of  the  target  power,  for  a  wide  range  of  signal  levels  the  steady  state  performance  will 
be  better  than  that  suggested  by  (4.87)  and  (4.88).  The  steady  state  performance  of 
the  nic  method  can  always  be  improved  by  increasing  the  value  of  Xnxaxj  but  that  is 
associated  with  slower  convergence.  The  tradeoff  between  steady  state  performance 
and  convergence  time  involved  selecting  an  appropriate  value  for  Xja&x  is  similar  to 
that  which  occurs  in  selection  of  the  dimensionless  step-size  parameter,  a.  Choosing 
a  value  of  Xiaax  that  is  small  produces  values  of  B  close  to  unity,  resulting  in  the 
steady  state  performance  described  by  (4.87)  and  (4.88).  Choosing  a  larger  value  of 
-Xmax  produces  better  steady-state  performance  but  proportionally  longer  convergence 
time,  which  will  affect  the  low  TJR  cases  more  adversely.  A  reasonable  choice  of  Xmax 
can  only  be  made  if  the  range  of  input  signal  powers  is  known.  This  may  be  true 
not  only  for  telephone  Hnes,  but  also  for  digital  systems  that  perform  analog-to- 
digital  conversion  of  the  input  data.  In  summary,  the  nic  method  has  some  potential 
for  applications  when  enough  information  is  available  a  priori  to  make  a  reasonable 
choice  of  Xmax  based  on  information  about  both  the  absolute  power  levels  and  the 
range  of  TJRs  over  which  the  system  operates. 

Signed-error  LMS  algorithm 

When  an  application  requires  reduction  in  the  computational  complexity  of  the  LMS 
algorithm,  signed  algorithms  axe  often  used  to  eliminate  the  multiplication  required 
by  the  true  correlation  multiplier,  y{n)'x{n)  in  (4.17).  Sullivan  (1993)  summarizes  the 
following  four  signed  algorithms,  which  use  these  functions  to  calculate  the  correlation 
multipher: 

•  Signed  product  -  sign[2/(7i)a:(7i)] 
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•  Signed  regressor  -  2^(n)sign[s(n)] 

•  Signed  error  -  sign[y(n)]x(n) 

•  Signed  maximum  -  sign[y(n)®(n)]niin(|y(n)||a!(n)|). 

The  signed-error  algorithm  can  be  analyzed  within  the  framework  proposed  here, 
because  it  can  be  expressed  in  terms  of  the  traditional  LMS  algorithm  in  (4.17)  with 
time-varjdng  step-size  parameter,  given  by 


fi{n)  = 


a 


(4.89) 


Substituting  (4.89)  into  (4.30)  gives 
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If  z  is  a  Gaussian  random  variable  with  zero  mean  and  variance  crl,  then  the  mean 
of  its  absolute  value,  equals  (Mathews  and  Cho,  1987).  Assuming  that 

y{n)  can  be  modeled  as  a  zero-mean,  Gaussian  process,  applying  this  assumption  to 
£?[|i/(ti)|]  and  substituting  (4.28)  gives 
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Replacing  l2/(n)l  in  (4.90)  with  its  expected  value,  substituting  (4.92)  and  rearranging 
yields 
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The  positive  solution  to  (4.93)  is 
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This  result  is  equivalent  to  the  steady-state  excess  mse  for  the  signed  error  al¬ 
gorithm  determined  by  Mathews  and  Cho  (1987).  This  equivalence  can  be  seen  by 
substituting  a  =  2a' into  (4.94),  giving 

+  a!  Ja'^  Jmin) 

J„(c»)  = - ^ - 2 - 

The  expression  given  by  (4.95)  can  be  obtained  from  the  steady-state  standard  devia¬ 
tion  of  the  error  given  by  Mathews  and  Cho  (1987)  by  accounting  for  the  taxget  signal 
power,  squaring  their  Eq.  39,  and  subtracting  Jmin  to  obtain  the  steady-state  excess 
mse.  In  the  presence  of  strong  target  signals,  the  steady  state  performance  of  the 
signed  error  algorithm  is  proportional  to  the  square  root  of  the  target  signal  power. 
Thus,  strong  target  signals  affect  the  signed  error  algorithm  less  adversely  than  they 
affect  the  traditional  algorithm,  where  the  steady-state  excess  mse  is  proportional  to 
the  target  signal  power. 

LMS  algorithm  with  adaptive  damped  convergence  factor 

Kami  and  Zeng  (1989)  suggest  a  modification  to  the  LMS  algorithm  intended  to 
overcome  the  fundamental  tradeoff  between  convergence  time  and  steady-state  er¬ 
ror.  Their  method  provides  both  rapid  convergence  and  reduced  misadjustment  by 
allowing  the  step-size  parameter  to  have  a  large  initial  value,  and  then  progressively 
reducing  the  step-size  parameter  as  the  error  signal  converges.  Specifically,  they  ad¬ 
just  the  step-size  parameter  according  to 

=  ^(1  -  (4.96) 

where  e(n)  is  the  error  signal,  /3  is  a  damping  parameter,  and  H  ||  denotes  the  vector 
norm.  They  suggest  that  the  vector  norm  can  be  replaced  by  the  vector  norm  squared 
for  ease  of  computation. 

It  appears  that  this  approach  will  work  as  desired  in  appKcations  with  no  (or 
weak)  target  signals.  However,  for  the  adaptive  noise  canceller,  the  error  signal  e(n) 
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is  obtained  from  the  system  output,  y{n),  which  is  a  poor  estimate  of  the  output  error 
in  the  presence  of  strong  target  signals.  Therefore,  strong  target  signals  will  cause 
the  step-size  paxameter  computed  according  to  (4.96)  to  increase,  leading  to  increased 
steady-state  mse.  In  order  to  gain  insight  into  the  performance  of  this  method  in  the 
steady-state,  it  is  assumed  that  y{n)  and  x(n)  axe  uncorrelated  (which  is  valid  if4he 
weights  are  converged  to  their  optimal  values  so  that  E[y^{n)]  ^  Jm\rt  +  CTj ).  Using 
the  vector  norm  squared  and  replacing  ||x|p  with  its  expected  value,  Lai,  yields 

/*«,(»)  =  (4.97) 

Substituting  (4.97)  into  (4.30)  produces 

When  the  target  signal  is  strong,  so  that  the  exponential  term  approaches  zero,  (4.98) 
becomes 

«7ex(oo)  =  Jmin  +  O’t  •  (4.99) 

Like  the  traditional  method,  this  method  of  adjusting  the  step-size  paxameter  causes 
the  expected  value  of  the  excess  mse  to  be  proportional  to  the  target  signal  power, 
leading  to  poor  performance  in  the  presence  of  strong  target  signals. 
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Chapter  5 


Intermicrophone  correlation  for 
target-to-jammer  ratio  hypothesis 
test 


5.1  Introduction 

As  discussed  in  Sec.  2.3.1,  the  performance  of  adaptive  systems  degrades  at  high 
target-to-jammer  ratios.  Greenberg  and  Zurek  (1992)  proposed  computing  a  running 
measure  of  intermicrophone  correlation,  a  metric  related  to  the  short-term  TJR,  and 
then  inhibiting  the  adaptive  process  when  the  correlation  exceeds  some  threshold. 
This  is  appropriate  for  the  hearing-aid  appUcation  because  speech  signals  exhibit  a 
high  degree  of  power  fluctuations,  so  the  short-term  TJR  will  contain  pauses  that 
allow  adaptation  even  when  the  long-term  TJR  exceeds  the  threshold. 

The  purpose  of  this  chapter  is  to  investigate  controlling  adaptation  based  on  the 
correlations  between  pairs  of  microphones.  The  basic  idea  is  to  compute  running 
measurements  of  correlation  for  pairs  of  microphones  for  each  iteration  of  the  process. 
These  correlation  measures  are  combined  and  compared  to  a  threshold  value.  If  the 
correlation  measure  exceeds  the  threshold,  then  the  decision  is  that  the  TJR  is  “high” 
and  the  adaptive  weights  remain  at  their  previous  values.  If  the  correlation  measure 
is  less  than  the  threshold,  then  the  decision  is  that  the  TJR  is  “low”  and  the  adaptive 
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weights  are  updated. 

In  this  approach,  the  threshold  is  applied  to  the  correlation  measure,  and  the 
decision  corresponds  to  whether  or  not  the  TJR  falls  within  one  of  two  ranges.  This  is 
a  simpler  problem  than  attempting  to  estimate  the  TJR  from  the  correlation  measure. 

Other  researchers  have  considered  similar  mechanisms  to  permit  adaptation  in 
intervals  of  low  TJR  and  prevent  adaptation  in  intervals  of  high  TJR.  Those  methods 
rely  on  measures  of  the  energy  in  the  input  signals  to  determine  whether  adaptation 
should  be  enabled  or  disabled.  The  methods  proposed  by  Van  CompernoUe  (1990a,b) 
and  by  Harrison  et  al.  (1986)  rely  on  the  assumption  that  the  long-term  TJR  is 
always  positive.  The  methods  described  by  Kaneda  and  Ohga  (1986)  and  by  Sondhi 
and  Berkley  (1980)  are  based  on  additional  information  about  the  presence  or  absence 
of  the  target  signal  that  is  not  available  in  the  hearing  aid  application. 

Kompis  (1993)  proposed  the  sigma-delta  method  to  control  adaptation  of  a  two- 
microphone  generalized  sidelobe  canceller.^  This  method  uses  the  ratio  of  power  in 
the  primary  signal  to  the  sum  of  the  powers  in  the  primary  and  reference  signals.  He 
evaluated  this  method  and  compared  it  to  two  other  methods,  the  intermicrophone 
correlation  used  by  Greenberg  and  Zurek  (1992)  and  a  multidimensional  correlation 
method  based  on  the  cross-correlation  function  between  the  two  microphone  signals  at 
time  lags  ranging  from  —0.8  ms  to  -+-0.8  ms.  For  the  conditions  used  in  the  evaluation, 
the  multidimensional  correlation  provides  the  best  results,  but  also  requires  the  most 
computation.  In  the  remainder  of  his  work,  Kompis  uses  the  sigma-delta  method 
because  it  provides  acceptable  performance  at  low  computational  complexity. 

This  chapter  investigates  controlling  adaptation  at  high  TJR  based  on  the  corre¬ 
lation  between  microphones  as  proposed  by  Greenberg  and  Zurek  (1992).  It  considers 
previously  neglected  issues  such  as  criteria  for  selecting  the  correlation  threshold,  the 
effect  of  reverberation,  and  incorporating  information  from  multiple  pairs  of  micro¬ 
phones.  The  following  section  contains  an  analysis  of  the  relationship  between  the 
intermicrophone  correlation  and  the  TJR;  the  intermicrophone  correlation  is  used  as 
a  decision  variable  for  choosing  between  the  “low”  and  “high”  TJR  hypotheses.  It  is 

^This  method  is  also  described  in  Dillier  et  al.  (1993). 
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followed  by  simulations  demonstrating  its  effectiveness. 


5.2  Analysis  of  intermicrophone  correlation  for 
determining  TJR 

This  section  contains  an  analysis  of  the  intermicrophone  correlation  as  a  means  of 
determining  TJR.  It  starts  with  a  derivation  of  the  probabihty  density  function  (pdf) 
of  the  intermicrophone  correlation  for  a  single  directional  source  arriving  from  a  range 
of  angles.  The  pdf  for  a  single  source  is  then  used  to  derive  the  pdf  for  two  directional 
sources  arriving  from  different  ranges  of  angles,  conditioned  on  the  relative  strengths 
of  the  sources  (TJR).  It  is  assumed  that  within  the  ranges  of  angles  defining  target 
and  jammer  sources,  all  angles  of  incidence  are  equally  hkely.  Next  the  effect  of 
reverberation  is  included  in  the  pdf.  Then,  binary  hypothesis  testing  (Van  Trees, 
1968)  is  used  to  determine  a  threshold  on  the  correlation  that  corresponds  to  the 
desired  ranges  of  TJR.  Several  methods  are  proposed  for  combining  the  correlation 
measures  from  different  pairs  of  microphones. 


5.2.1  Probability  density  functions 

Correlation  of  one  directional  source 

To  derive  the  pdf  of  the  intermicrophone  correlation  for  a  single  source,  consider  one 
source  arriving  from  an  unknown  angle  of  azimuth  in  the  horizontal  plane,  denoted 
6.  If  it  is  assumed  that  all  angles  of  incidence  are  equally  likely,  6  can  be  treated 
as  a  random  variable  with  uidform  density  on  the  interval  6\  <  0  <  62-  Under  this 
assumption,  the  probability  density  function  is 


/..(«) 


1 


(5.1) 


for  61  <  6  <  02- 

Assuming  plane-wave  propagation,  the  time  delay  between  the  signals  arriving  at 
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two  microplioiies  in  free  space  is 


r  =  -  sin  0 
c 


(6.2) 


where  d  is  the  intermicrophone  spacing,  c  is  the  speed  of  sound,  and  6  is  zero  when 
the  source  is  oriented  broadside  to  the  two  microphones.  The  normalized  correlation 
coefficient,  p,,  between  the  two  microphone  signals  for  a  pure  tone  source  of  frequency 
/  arriving  with  time  delay  r  is 


Pn  =  cos(27r/r) 


(5.3) 


(Cremer  and  Muller,  1982),  and  substituting  (5.2)  gives 


Ps 


cos(fc<isin  6), 


(6.4) 


where  the  wavenumber  k  is 

c 

The  pdf  for  pg  can  be  derived  from  (5.1)  and  (5.4)  using  standard  methods  for 
deriving  pdfs  of  functions  of  random  variables.  The  resulting  pdf  for  the  intermicro¬ 
phone  correlation  of  a  single  narrowband  directional  source  is 


fp.ip)  = 


{e2-ei)sJ{kdf-{  arccos  p^y/l  — 


(5.6) 


for  cos(fedsin02)  <  P  <  cos(fcdsin  ^i). 

Now  consider  the  cases  of  target  and  jammer  individually.  The  target  signal  is 
defined  as  any  source  arriving  at  the  microphones  from  a  range  of  angles  near  straight 
ahead  (zero  degrees  azimuth  in  the  horizontal  plane).  Because  the  cosine  function 
is  even,  positive  and  negative  angles  produce  the  same  values  of  intermicrophone 
correlation.  Therefore,  negative  angles  can  be  ignored  and  the  range  of  angles  for 
the  target  signal  is  restricted  to  0  <  0  <  0o,  where  do  is  a  relatively  small  angle. 
Substituting  6i  =  0  and  62  =  do  into  (5.6)  produces  the  pdf  for  a  directional  target 
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signal, 


1 


(5.7) 


6oyl{kdy  —  (arccos  pYy/l  — 

for  cos(A;<2sin  eo)<p<  1. 

Similaxly,  the  jammer  signal  is  defined  as  any  source  arriving  at  the  microphones 
from  any  direction  outside  the  range  of  target  angles.  Because  the  correlation  does 
not  distinguish  between  signals  arriving  from  the  front  and  the  rear,  for  this  analysis, 
signals  arriving  within  of  180°  are  not  included  in  the  definition  of  jammer.  Fur¬ 
thermore,  because  of  the  symmetry  of  the  sine  function,  angles  larger  than  90°  can 
be  ignored  along  with  the  negative  angles.  Substituting  6i  =  9o  and  ^2  =  f  iiifo  (5.6) 
produces  the  pdf  for  a  directional  jammer  signal. 


fpiip)  = 


(f  6o)\/{kdy  —  (axccos/?)2\/l  ~ 


(5.8) 


for  cos(fcd)  <  p  <  cos(fcdsin  ^o)- 

The  pdfs  for  a  single  directional  target  and  a  single  directional  jammer  described 
by  (5.7)  and  (5.8)  are  shown  in  Fig.  5.1  for  Oq  =  arcsin(|)  =  14.5°  and  for  several 
values  of  kd. 

Next,  consider  the  case  of  two  independent  directional  sources,  one  target  and  one 
jammer.  The  total  intermicrophone  correlation,  pd,  is  the  normalized,  weighted  sum 
of  the  taiget  and  jammer  correlations,  given  by 


_  a^pt  -1-  _  Ypt  -t-  pj 

+  Y  +  1 


(5.9) 


where  <r^  and  c?  are  the  target  and  jammer  signal  powers,  and  Y  =  the  taxget- 
to-jammer  ratio.  The  pdf  of  the  sum  of  two  independent  random  variables  is  the 
convolution  of  their  two  pdfs.  Since  pt  and  pj  are  random  variables  with  known  pdfs, 
the  pdf  of  pd  conditioned  on  Y,  denoted  /pi|y(p|l^))  can  be  determined  by  convolving 
(5.7)  and  (5.8),  with  appropriate  scaling  by  Y  and  y^.  This  is  shown  easily  with 
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pdf  for  target 


pdf  for  jammer 


Figure  5-1:  Probability  density  functions  for  the  correlation  of  single  source  described 
by  (5.7)  and  (5.8)  for  three  values  of  kd. 
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the  use  of  the  intermediate  variables: 


,  _  _x_ 

Pt 

,  _  _1_  . 

P  3  y'  ^  ’ 

Pd  =  p't  +  p'j- 


(5.10) 

(5.11) 

(5.12) 


Then  the  pdfs  are 

yj-l  /F  +  1  \ 

U\y(p\Y)  =  -^/p<  (-^p) .  (5.13) 

/p;.|y(^i^)  =  (5^  +  ^)U{Y  +  l)p)>  (5.14) 

and 


fpd\Y{p\Y)  =  fp'^\Y{pl\y)  *  fp>.\Y{p2\Y) 

=  in  (^'>)  *  +  l)rt'  (515) 

Closed  form  solutions  cotdd  not  be  found  for  the  integrals  that  result  from  sub¬ 
stituting  (5.7)  and  (5.8)  into  (5.15),  so  the  following  approximation  was  used.  From 
Fig.  5.1,  it  can  be  seen  that  a  reasonable  approximation  for  fpXp)  is  given  by  a  con¬ 
stant  over  the  range  cos(fedsin  eo)<p<l  plus  an  impulse  at  p  =  1.  The  constant 
is  selected  so  that  the  area  under  the  constant  portion  equals  |  and  the  area  of  the 
impulse  is  This  approximation  for  the  target  pdf  is  given  by 

for  cos(fcdsin0o)  <  P  <  Ij  where  S  is  the  unit  impulse  function.  Figure  5.2  shows  a 
comparison  between  this  approximation  and  the  pdf  described  by  (5.7)  for  do  =  14.5° 
and  several  values  of  kd. 

The  conditional  pdf  is  obtained  by  substituting  (5.16)  and  (5.8)  into  (5.15).  The 

^  These  values  were  chosen  because  the  target  pdf  evaluated  at  the  lower  limit  on  p  [pto  = 
cos(fcdsin0o)]  is  close  to  for  a  range  of  values  of  6o  and  kd.  Specifically,  S  <  < 

for  10°  <  00  <  15°  and  0  <  kd  <  ir. 
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kd=  pi 


Figure  5-2:  Original  and  approximate  probability  density  functions  for  the  tar¬ 
get-source  correlation  described  by  (5.7)  and  (5.16)  for  three  values  of  kd. 
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resulting  convolution  is  performed  in  Appendix  A,  and  the  results  are  reproduced 
here: 


fpMp\y) 


Y+l 


2Y(^  —  ^o)(l  —  cos(fed  sin  0o)) 


TT  .  /  arccos((Y'  +  l)p  —  Y  cos(fc(isin  0o)) 

—  —  arcsm  j - — - 

2  \  kd 


[u{p  -  pi)  -  u{p  -  p2a)] 


+ 


Y[l  —  cos(fcdsin  0o)) 


.  ( arccos((y  +  l)p  —  F)\  .  ( ajrccos((y  +  l)p  —  Y  cos{kdsm  ^o))\ 

+  “““  [ - Td - )  “  i - kd - ), 


^hd?  -  (arccos(p(r  +  1)  -  r))^!  -  (p(Y  +  1)  - 

(ajrccos((y’  +  l)p  —  Y  cos{kdsm 

_ 

[u{p  -  P2a)  -  U{p  -  P3a)] 

F(1  —  cos(fcdsin  0o)) 

^Jkd^  -  (arccos(p(y  +  1)  -  y))Vl  -  {p{Y  +  1)  -  Yf 


+ 


.  /arccos((y +  l)p-F)' 
arcsm  j - ; - 

V 


00 


[u{p  -  Pza)  -  n{p  -  Pi)] 


} 


(5.17) 


for  Y  <Yo  and 


Mp\y)  = 


Y+l 


2Y(j  —  ^o)(l  ~  cos(A:dsin  ^o)) 


{X  .  /arccos((y’ +  l)p  —  Fcos(/!£Zsin6o))M  r  /  \  / 

-  -  arcsm  I - ^ ^  1  [u{p  -  pi)  -  u{p  -  p2a)] 


+  (^  -  0^  [«  {p  -  P2b)  -u{p-  P36)] 

1^(1  —  cos(A;dsin5o)) 


+ 


^Jkd?  -  {arccos(p(r  +  1)  -  y))^!  -  {p{Y  +  1)  -  Y)^ 


.  f  arccos((y’  +  l)p  —  y) 

[ - Fd - 


j  -00 


[u{p  -Pza)-  u{p 


-  P4)l| 


(5.18) 


for  y  >  Yo,  where 


yo 


cos(fc(isin  0o)  —  cos(kd) 


(5.19) 


1  —  cos[kd  sin  0q) 

Figure  5.3  shows  the  conditional  pdf  fpi\Y{p\Y)  given  by  (5.17)  or  (5.18)  for  several 
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conditional  pdf  forTJR  =  -10dB 


conditional  pdf  forTJR  =  OdB 


conditional  pdf  forTJR  =  10dB 


rho 


Figure  5-3:  Probability  density  functions  for  the  correlation  of  two  sources  described 
by  (5.17)  and  (5.18),  conditioned  on  TJR  and  for  three  values  of  kd. 

values  of  Y  and  kd. 

5.2.2  Correlation  in  reverberation 

The  above  derivation  of  the  pdfs  only  considered  the  direct  target  and  jammer  signals. 
The  next  step  is  to  include  reverberation. 

As  a  first  approximation,  assume  that  the  reverberant  portion  of  the  signals  can  be 
modeled  as  a  diffuse  sound  field.®  With  this  assumption,  the  direct  and  reverberant 
portions  of  the  signal  will  be  independent.  As  seen  earlier  for  combining  correlations  of 
independent  target  and  jammer,  the  total  correlation,  ptot ,  is  the  normalized,  weighted 

®Note  that  this  approach  neglects  early  reflections. 


91 


sum  of  the  direct  and  reverberant  correlations,  given  by 


_  (rlpd  +  alpr  _  Wpd  +  pT 

0-1  + al  ~  W+1 

where  <r^  and  are  the  direct  and  reverberant  signal  powers,  and  W  =  ^,  the 
direct-to-reverberant  ratio.  The  intermicrophone  correlation  of  a  diffuse  sound  field 
is 

sm{kd) 

""  kd 

(Cremer  and  Muller,  1982).  Note  that  this  applies  to  the  reverberant  portion  of 
both  the  target  and  jammer  signals.  Furthermore,  for  a  particular  value  of  kd,  the 
correlation  of  the  diffuse  field  is  a  constant.  Substituting  (5.21)  into  (5.20)  gives 


Ptot  — 


Wpd+^^ 

W+1 


(5.22) 


Treating  the  direct-to-reverberant  ratio,  W,  as  an  unknown  constant,  the  total 
correlation,  ptot>  is  a  random  variable  with  a  conditional  pdf  related  to  the  conditional 
pdf  of  the  direct  correlation  pd,  given  by  (5.17)  and  (5.18).  The  effects  of  p^  and  W 
are  to  shift  the  range  of  ptot  over  which  the  pdf  is  nonzero  by  and  to  reduce 

the  range  of  ptot  over  which  the  pdf  is  nonzero  by  a  factor  of 

Figure  5.4  shows  the  relationship  between  pd  and  ptot  given  by  (5.22).  Each  plot  in 
Fig.  5.4  shows  the  relationship  between  the  direct  correlation  and  the  total  correlation 
for  a  single  value  of  kd  and  five  values  of  W  ranging  from  0.1  to  10  (-10  dB  to  -|-10 
dB).  The  slope  of  each  line  is  which  approaches  zero  for  small  W  and  approaches 
one  for  large  W.  For  each  value  of  kd,  the  fines  for  different  values  of  W  all  intersect 
at  ptot  —  pd  =  ^ ,  which  can  be  verified  by  substituting  pd  =  into  (5.22). 

This  result  has  important  implications  for  threshold  selection,  as  discussed  in  the  next 
section. 


92 


0.6  -0.4 


-0.2 


0 

rho  d 


0.2 


_ I _ i_ 

0.4  0.6 


0.8  1 


Figure  5-4:  Relationship  between  correlation  with  and 
scribed  by  (5.22)  for  five  values  of  direct-to-reverberant 
and  for  three  values  of  kd  (f  ,^j7r). 


without  reverberation  as  de¬ 
ratio  {W  =  0.1,  0.3,  1,  3,  10) 


S] 


5.2.3  Hypothesis  testing 

Binary  hypothesis  test 


As  discussed  in  Sec.  5.1,  the  goal  is  to  use  the  correlation  measure  as  the  observation  in 
a  binary  hypothesis  test  (Van  Trees,  1968).  The  two  hypotheses  are  TJR<  0  dB  (HO 
=  “low”  TJR)  and  TJR>  0  dB  (HI  =  “high”  TJR),  although  the  Mowing  approach 
could  be  appUed  to  any  pair  of  ranges  of  TJR.  The  cutoff  is  set  to  0  dB  because, 
as  discussed  in  Sec.  2.3.1,  the  detrimental  effects  of  misahgnment  and  misadjustment 
are  proportional  to  TJR  and  are  typically  noticeable  at  positive  values  of  TJR. 

Correctly  determining  that  HO  is  true  (correctly  saying  TJR<  0  dB)  will  he  re¬ 
ferred  to  as  a  detection  and  results  in  adapting  nnder  the  desired  circumstances. 
Incorrect  determination  of  HO  (saying  that  TJR<  0  dB  when  actually  TJR>  0  dB)  is 
a  false  alarm  and  results  in  adapting  when  it  is  undesirable.  Incorrect  determination 
of  HI  (saying  that  TJR>  0  dB  when  actually  TJR<  0  dB)  is  a  miss  and  results  in 

not  adapting  nnder  circumstances  when  adapting  was  desirable. 

Of  the  two  types  of  errors,  false  alarms  are  potentially  more  damaging  than  misses, 
adapting  when  it  is  undesirable  may  degrade  the  signal,  wlula  not  adapting 
when  it  is  desirable  only  slows  the  convergence  of  the  adaptive  weights.  Selecting  the 
threshold  on  the  correlation  to  distinguish  between  the  two  hypotheses  controls  the 
tradeoff  between  misses  and  false  alarms.*  Because  of  the  nature  of  this  tradeoff,  it  is 
difEcull  to  quantify  the  costs  associated  with  these  two  types  of  errors.  Quahtatively, 
it  is  reasonable  to  permit  a  relatively  high  rate  of  misses  in  order  to  obtain  a  lower 

rate  of  false  alarms. 

In  previous  sections,  the  TJR,  Y,  has  been  considered  an  unknown  constant,  but  m 
order  to  formulate  the  problem  as  a  hypothesis  test,  it  is  necessary  to  assume  a  known 
distribution.  Obvionsly,  some  degree  of  approximation  is  required  in  making  such  an 
assumption.  Previous  studies  have  shown  that  conversations  in  noisy  environments 
often  occur  at  long-term  TJRs  of  1-5  dB  (Plomp,  1977;  Teder,  1990).  Short-term 


SThi.  tradeoff  of  co.v«gc.c...^ws„,«,o,  in  the 

inherent  in  selection  of  the  step-size  parameter,  y.,  in  (4.17),  where  sm 
misadjustment  at  the  cost  of  longer  convergence  times. 
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fluctuations  in  speech  level  typically  range  from  18  dB  above  to  12  dB  below  the 
average  power  level  (Kryter,  1962).  In  the  following  analysis,  it  is  assumed  that  the 
TJR  is  evenly  distributed  on  the  range  -20  dB  to  +20  dB,  so  that 

fu.iU)  =  ^  (5.23) 

for  —20  dB  <  Us  <  +20  dB,  where 

C/=101ogior.  (5.24) 

The  sensitivity  of  the  analysis  to  this  assumption  is  considered  below.  Note  that  one 
result  of  this  assumption  is  that  the  two  hypotheses,  HO  and  HI,  are  equally  likely. 

First  considering  the  case  of  no  reverberation,  the  pdf  of  pd  conditioned  on  the 
two  hypotheses  HO  and  HI  is  found  by  integrating  the  conditioned  pdf  given  by  (5.17) 
and  (5.18),  that  is, 

fp,\Ho{p\HQ)  =  r  f,,^Y{p\Y)dU  (5.25) 

J  —20 
r+20 

fp,\m(p\m)  =  /  f,^\Y{p\Y)dU  (5.26) 

«/o 

where  Y  =  These  two  integrations  were  performed  numerically  using  the 

trapezoidal  rule  to  produce  the  curves  shown  in  Fig.  5.5. 

For  a  particular  threshold,  po,  the  probability  of  detection  is 

/Po 

^  fpi\Bo{p\H0)dp  (5.27) 

Similarly,  the  probability  of  fedse  alarm  is 

/Po 

^  fpi\Biip\Hl)dp,  (5.28) 

while  the  probability  of  a  miss  is 


Prr.  =  l-  Pf. 


(5.29) 
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Figure  5-5:  Probability  density  functions  for  the  correlation  of  two  sources  described 
by  (5.25)  and  (5.26),  conditioned  on  the  hypotheses  HO  and  HI  described  in  the  text, 
for  three  values  of  kd. 


ROC 


Figure  5-6:  Receiver  operating  characteristic  for  binary  hypothesis  testing  for  three 
values  of  kd.  The  points  labeled  with  ‘x’  indicate  the  performance  when  the  thresh¬ 
old,  po,  is  zero  and  the  points  labeled  with  indicate  the  performance  when 
Po  =  0.1, 0.2,...,  0.8. 

Again  performing  these  integrations  numerically  for  values  of  po  in  the  range  —  1  < 
Po  <  1  and  plotting  Pd  versus  Pf  produces  the  receiver  operating  characteristic  (ROC) 
curves  shown  in  Fig.  5.6.  Figure  5.6  shows  that  for  any  choice  of  threshold,  the  best 
performance  is  obtained  with  kd  =  tt. 

Substituting  (5.5)  into  the  choice  oi  kd  =  ir  and  rearranging  results  in 


f  = 


c 


(5.30) 


For  each  microphone  spacing,  the  correlation  measure  providing  the  most  accurate 
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decision  of  the  range  of  TJR  is  produced  at  a  different  frequency;  for  a  microphone 
separation  of  d  =  7cm,  that  frequency  is  /  =  2464  Hz.  Based  on  the  ROC  curves  of 
Fig.  5.6,  frequencies  selected  based  on  kd  =  x,  and  therefore  (5.30),  will  be  used  in 
the  remainder  of  this  work. 

Next,  consider  the  effect  of  reverberation,  as  described  by  (5.22).  The  presence  of 
reverberation  causes  the  conditional  pdfs  shown  in  Fig.  5.5  to  be  shifted  and  scaled 
along  the  abscissa,  because  (5.25)  and  (5.26)  axe  now  integrals  of  a  random  variable 
shifted  and  scaled  as  in  (5.22).  However,  since  the  magnitude  of  the  scaling  and 
shifting  is  identical  for  the  pdfs  corresponding  to  both  hypotheses,  the  ROC  curves 
in  Fig.  5.6  are  unchanged  for  any  level  of  reverberation.®  Therefore,  if  the  level  of 
reverberation  is  known,  then  the  transformation  between  pd  and  ptot  is  known  and  it 
is  possible  to  select  a  threshold,  poj  ftat  will  produce  performance  corresponding  to 
a  desired  point  on  the  ROC  curve  when  applied  to  ptot-  However,  when  the  level  of 
reverberation  is  unknown,  then  a  single  threshold,  po,  apphed  to  the  total  correlation, 
Ptot,  maps  to  a  range  of  values  of  pd,  corresponding  to  a  range  of  values  on  the  ROC 
curve.  This  issue  is  discussed  in  the  next  section. 

Finally,  the  sensitivity  of  this  analysis  to  the  assumption  that  the  TJR  is  evenly 
distributed  between  —20  and  +20  dB  is  considered.  The  two  curves  in  Fig.  5.7(a) 
show  Pd  and  Pf  versus  threshold  for  kd  =  x.  (The  ROC  curve  in  Fig.  5.6  was  produced 
from  the  same  values  of  Pd  and  P/.)  The  curves  in  Fig.  5.7(b)  show  the  probability 
that  the  correlation  is  below  the  threshold,  versus  threshold,  for  individual  values  of 
TJR.  When  the  TJR  is  less  than  0  dB,  the  curve  corresponds  to  Pd  for  that  value 
of  TJR.  Similarly,  when  the  TJR  exceeds  0  dB,  the  curve  corresponds  to  Pf  for  that 
value  of  TJR.  The  two  curves  in  Fig.  5.7(a)  can  be  thought  of  as  the  integrals  of 
all  such  single  TJR  curves  within  the  two  ranges  of  —20  dB  <  TJR  <  0  dB  and  0 
dB  <  TJR  <  20  dB.  Because  of  the  similar  shapes  of  these  curves  and  the  relative 
symmetry  of  the  contributions  at  positive  and  negative  TJRs,  the  ROC  curves  shown 

^Theoretically,  this  is  true  for  any  level  of  reverberation.  In  practice,  in  extreme  reverberation 
(as  the  direct-to-reverberant  ratio,  W,  approaches  zero)  the  location  on  the  ROC  curve  becomes 
increasingly  sensitive  to  the  threshold  value,  and  the  shape  of  the  ROC  curve  becomes  increasingly 
sensitive  to  the  assumptions  regarding  the  probability  distributions. 
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-20  dB  <  TJR  <  0  dB  and  0  dB  <  TJR  <  20  dB 


Figure  5-7:  (a)  Cumulative  probability  distributions  for  the  correlation  of  two  sources 
with  two  ranges  of  TJR:  —20  dB<TJR<0  dB  and  0  dB<TJR<20  dB.  (b)  Cumulative 
probability  distributions  for  the  correlation  of  two  sources  with  specified  TJR  ranging 
from  —20  dB  to  20  dB. 

in  Fig.  5.6  are  relatively  robust  to  violations  of  the  assumption  that  the  TJR  is  evenly 
distributed  between  —20  and  -1-20  dB. 

Choice  of  threshold 

Based  on  the  above  analysis,  it  is  possible  to  select  the  threshold  to  achieve  a  desired 
result.  Perhaps  the  most  obvious  option  is  to  choose  threshold  po  =  because, 

as  seen  in  Sec.  5.2.2,  at  that  point  pd  and  ptot  are  equal  for  any  level  of  reverberation, 
so  with  that  threshold,  the  performance  will  correspond  to  a  single  point  on  the  ROC 
curve  for  all  levels  of  reverberation.  Each  curve  in  Fig.  5.6  is  labeled  with  ‘x’  at  the 
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•  i  j*  A  sinikd) 

point  corresponding  to  po  =  kd' 

One  drawback  to  this  approach  concerns  the  distribution  of  errors  as  a  function 
of  jammer  angle.  Figure  5.6  shows  that  for  kd  =  ir  and  po  =  =  0,  the  detection 

rate  is  greater  than  75%,  so  the  system  will  miss  more  than  25%  of  the  time  when 
TJR  <  0  dB.  This  would  not  be  a  problem  if  these  errors  were  distributed  more  or 
less  evenly  over  jammer  angle,  because,  as  discussed  earlier,  it  is  reasonable  to  permit 
some  naisses  to  obtain  a  low  rate  of  false  alarms.  However,  these  misses  are  not  evenly 
distributed,  rather  they  are  concentrated  where  correlation  is  high,  which  corresponds 
to  small  jammer  angles. 

To  illustrate  this,  consider  the  correlation  produced  by  a  single  jammer  arriving 
from  angle  6  given  by  (5.4).  With  fed  =  tt,  p  =  0.69  for  6  =  15°  and  p  =  0  for  9  =  30°. 
So  with  the  threshold  set  to  po  =  0  and  with  no  target  signal  present,  the  system  will 
not  adapt  in  the  presence  of  jammers  arriving  from  angles  between  15°  and  30°.  This 
range  of  angles  is  slightly  higher  for  values  of  kd  <  ir.  And  when  the  target  signal  is 
included,  this  range  of  angles  extends  beyond  30°,  increasing  with  TJR. 

In  the  remainder  of  this  work,  the  threshold  will  be  set  to  po  =  0  (that  is,  po  = 
for  kd  =  tt),  and  it  is  accepted  that  the  system  will  not  adapt  in  response  to 
jammers  at  angles  below  30°.  This  is  reasonable  since  the  original  choice  of  6o  =  14.5° 
was  somewhat  arbitrary  and  since  little  is  known  about  how  such  systems  will  be 
affected  by  head  movements  when  worn  by  human  listeners.  However,  when  real-time 
adaptive  multiple-microphone  systems  are  designed  for  field  tests,  it  is  suggested  that 
a  user  input  control  the  threshold  selection.  In  this  manner,  the  user  can  adjust 
the  effective  beamwidth  of  the  system  over  some  range,  but  the  cost  of  narrower 
beamwidths  is  higher  false  alarm  rates,  resulting  in  potential  degradations  of  the 
target  signal  at  high  TJR. 

For  these  future  systems,  it  will  be  useful  to  quantify  the  effect  of  increasing  the 
threshold.  This  corresponds  to  moving  to  the  right  along  the  ROC  curve,  so  that  Pf 
increases  together  with  Pd-  The  problem  is  that  for  unknown  levels  of  reverberation, 
it  is  not  possible  to  determine  a  mapping  between  the  threshold  and  the  location  on 
the  ROC  curve.  For  example,  if  the  threshold  on  ptot  is  po,  then  with  no  reverberation, 
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performance  is  the  same  as  for  pd  =  po,  with  a  direct-to-reverberant  ratio  of  0  dB, 
performance  is  the  same  as  for  pd  =  2po,  and  with  a  direct-to-reverberant  ratio  of  -10 
dB,  performance  is  the  same  as  for  pd  =  llpo- 

To  illustrate  this  effect  of  reverberation  on  performance,  in  Fig.  5.6  the  ROC 
curve  corresponding  to  fed  =  tt  is  labeled  with  at  points  corresponding  to  po  = 
0.1, 0.2, . . . ,  0.8.  As  an  example,  if  the  threshold  were  set  to  po  =  0.3,  then  the  maxi¬ 
mum  angle  of  an  undetected  jammer  (with  no  target  present)  would  decrease  to  24°. 
The  detection  and  false  alarm  rates  would  range  from  Pd  =  0.84  and  Pf  =  0.09  (for 
Pd  =  Po  =  0.3)  in  no  reverberation  to  Pd  =  0.94  and  Pf  =  0.26  (for  pd  =  2po  =  0.6) 
with  a  direct-to-reverberant  ratio  of  0  dB.  (direct-to-reverberant  ratio  of  0  dB).  The 
false  alarm  rate  would  be  much  higher  for  direct-to-reverberant  ratios  below  0  dB. 
However,  when  the  reverberation  dominates,  the  effect  of  these  errors  may  be  less 
severe. 

Furthermore,  (5.22)  suggests  that  as  reverberation  increases,  the  fluctuations  in 
the  correlation  measure  due  to  the  direct  target  and  jammer  signals  decreases.  There¬ 
fore,  if  it  were  necessary  to  determine  the  direct-to-reverberant  ratio  of  the  acoustic 
environment,  it  might  be  possible  to  estimate  that  quantity  from  the  variance  of  the 
correlation  measurement  over  a  reasonably  long  interval. 

Determining  broadband  TJR 

The  preceding  narrowband  analysis  has  considered  using  a  correlation  measurement 
from  one  pair  of  microphones  as  a  decision  variable  in  the  TJR  hypothesis  test  for 
a  particular  frequency.  However,  the  proposed  method  of  controlling  the  adaptive 
process  actually  requires  a  single  global  decision  as  to  the  range  of  the  broadband 
TJR.  This  raises  two  questions.  First,  if  only  two  microphones  are  available,  how 
accurately  can  the  range  of  the  broadband  TJR  be  determined  from  a  measure  based 
on  the  narrowband  analysis?  Second,  if  more  than  two  microphones  are  available, 
how  can  the  information  from  different  pairs  of  microphones  be  combined  to  generate 
the  most  accurate  decisions  concerning  the  range  of  broadband  TJR? 

In  addressing  the  first  question,  it  should  be  noted  that  the  correlation  measure- 
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ment  corresponding  to  each,  microphone  pair  will  not  be  based  on  a  single  frequency, 
rather,  it  will  be  computed  for  the  portion  of  the  signal  in  a  relatively  narrow  band 
about  the  frequency,  f,  determined  by  (5.30).  The  sensitivity  of  the  proposed  method 
to  varying  bandwidth  will  be  investigated  in  computer  simulations  in  the  next  section. 

For  arrays  of  more  than  two  microphones,  a  different  correlation  measurement  can 
be  obtained  from  each  pair  of  microphones.  If  different  pairs  of  microphones  have 
different  spacings,  then  they  wiU  provide  correlation  measurements  based  on  different 
frequencies,  according  to  (5.30).  To  address  the  second  question,  three  methods  of 
combining  correlation  measures  are  proposed.  The  first  method,  voting,  consists  of 
first  comparing  each  correlation  measurement  to  the  threshold,  and  then  using  the 
majority  of  those  decisions  to  determine  the  outcome.  The  second  method,  averag¬ 
ing,  consists  of  first  averaging  the  correlation  measurements  and  then  comparing  the 
average  to  the  threshold.  The  third  method,  power-weighted  averaging,  is  the  same 
as  averaging  except  that  each  correlation  measurement  is  first  weighted  by  a  running 
measure  of  the  power  in  the  corresponding  frequency  band.  AU  three  of  these  methods 
will  be  investigated  in  computer  simulations  in  the  next  section. 

5.3  Simulations 

The  simulations  in  this  section  are  intended  to  illustrate  the  utility  of  the  method 
analyzed  in  Sec.  5.2.  A  block  diagram  of  the  system  for  determining  the  range  of 
short-term  TJR  is  shown  in  Fig.  5.8.  First,  each  pair  of  microphone  signals  is  band¬ 
pass  filtered.  The  arithmetic  center  frequency  of  the  bandpass  filter  is  determined 
from  the  microphone  spacing  and  (5.30).  Next,  the  instantaneous  correlation  is  com¬ 
puted  using  a  hard  limiter,  as  in  Greenberg  and  Zurek  (1992),  resulting  in  a  binary 
quantity  that  is  the  product  of  the  signs  of  the  bandpass  filtered  signals.®  The  in¬ 
stantaneous  correlation  is  then  smoothed  by  a  first-order  recursive  lowpass  filter  with 

®The  hard  limiter  is  used  to  simplify  the  implementation  in  real-time  systems.  In  addition  to 
replacing  multiplies  with  sign-bit  comparisons,  the  hard  limiting  eliminates  the  need  to  divide  by 
the  square  root  of  the  signal  powers,  which  would  otherwise  be  needed  for  a  normalized  correlation 
measure. 


102 


Figure  5-8:  Block  diagraju  of  system  to  determine  range  of  TJR.  Each  pair  of  mi¬ 
crophone  signals  is  bandpass  filtered,  the  signs  of  the  bandpass  filtered  signals  are 
multiplied  to  produce  the  instantaneous  correlation,  and  these  values  are  smoothed 
by  a  first-order  lowpass  filter  with  a  time  constant  of  10  ms.  The  lowpass  filtered 
correlation  values  axe  combined  and  compared  to  the  threshold  by  one  of  the  three 
methods  described  in  the  text.  The  result  is  a  binary  decision  about  the  range  of 
TJR,  TJR  <  0  dB  or  TJR  >  0  dB. 

a  time  constant  of  10  ms.  This  value  was  chosen  because  it  is  suitable  for  tracking 
the  fluctuations  in  speech  levels  (Greenberg,  1989).  Finally,  the  lowpass  filtered  cor¬ 
relation  values  from  all  frequency  bands  are  combined  and  compared  to  the  threshold 
according  to  one  of  the  three  methods  (voting,  averaging,  or  weighted  averaging)  de¬ 
scribed  in  Sec.  5.2.3.  The  result  is  a  binary  decision  about  the  range  of  TJR,  TJR  < 
OdB  or  TJR  >  0  dB,  that  determines  whether  or  not  the  adaptive  weights  should  be 
updated. 
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5.3.1  Noise 


The  first  set  of  simulations  are  intended  to  verify  the  emalysis  of  Sec.  5.2  and  to 
investigate  the  robustness  of  the  algorithm  to  variations  in  bandwidth. 

For  these  simulations,  the  target  and  jammer  sources  were  14000  samples  of  zero- 
mean  Gaussian  noise.  The  jammer  source  had  unit  variance.  The  target  source  was 
scaled  to  produce  a  TJR  that  was  constant  over  1000-sample  intervals,  incrementing 
in  3  dB  steps  between  -19.5  and  19.5  dB. 

These  source  signals  were  convolved  with  anechoic  source-to-microphone  impulse 
responses  generated  by  the  room  simulation  described  in  Sec.  3.2.  The  array  contained 
two  microphones  with  7  cm  spacing.  The  target  angle  varied  from  0  to  12  degrees  and 
the  jammer  angle  varied  from  18  to  90  degrees,  both  in  4  degree  increments.  For  all 
76  combinations  of  target  and  jammer  angles,  the  running  correlation  was  computed 
as  shown  in  Fig.  5.8  and  compared  to  a  threshold  of  po  =  0.  For  comparison,  the 
running  correlation  was  also  computed  without  the  hard  limiter.  The  bandpass  filter 
had  a  center  frequency  of  2464  Hz,  computed  from  (5.30).  The  filter  bandwidth  varied 
from  10%  to  180%  of  the  center  frequency  (246  Hz  to  4435  Hz),  and  was  centered 
arithmetically. 

For  each  sample  point,  the  binary  result  of  the  processing  was  compared  to  the 
true  TJR.  Since  TJR  <  0  for  the  first  half  of  the  source  signals  and  TJR  >  0  for  the 
second  half,  the  detection  rate  was  computed  from  the  first  7000  points  and  the  false 
alarm  rate  was  computed  from  the  last  7000  points.  These  values  of  Pd  and  Pf  were 
averaged  for  all  combinations  of  target  and  jammer  angles.  The  results  are  shown  in 
the  left  half  of  Fig.  5.9.  For  a  wide  range  of  bandwidths,  Pd  ^  0.7  and  Pf  <  0.1. 

The  analysis  in  Sec.  5.2.3  indicates  that  for  po  =  0  and  kd  =  t  the  probabilities 
of  detection  and  false  al^Lrms  are  Pd  =  0.73  and  Pf  =  0.007.  This  corresponds  to 
the  points  marked  with  ‘x’  on  the  upper  curve  in  Fig.  5.6  and  with  in  Fig.  5.9. 
In  general,  there  is  good  agreement  between  the  values  predicted  by  the  analysis  of 
Sec.  5.2.3  and  the  results  of  the  simulations.  Several  factors  are  responsible  for  both 
the  discrepancies  between  the  analysis  and  the  simulation  results  and  the  trends  in 
the  simulation  results.  Those  factors  will  be  discussed  after  first  summarizing  the 
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noise,  jammer  18-90  deg. 


Figure  5-9:  Simulation  results  showing  rates  of  detection  and  false  alarms,  as  a  func¬ 
tion  of  bandwidth,  for  noise  received  by  two  microphones  in  an  anechoic  environment. 
The  points  labeled  with  ‘x’  indicate  the  performance  using  the  hard  limiter  (taking 
the  sign  of  bandpass  filtered  signals  as  shown  in  Fig.  5.8),  while  the  points  labeled 
with  ‘o’  indicate  the  performance  without  the  hard  limiter  (the  bandpass  filtered  sig¬ 
nals  are  multiplied  directly).  The  points  labeled  with  correspond  to  the  values 
determined  by  the  narrowband  analysis  of  Sec.  5.2.3. 
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relationship  between  correlation  measures,  thresholds,  and  false  alarm  and  detection 
rates. 

The  system  shown  in  Fig.  5.8  is  credited  with  a  correct  detection  when  the  mea¬ 
sured  correlation  is  less  than  the  threshold  (p  <  po)  and  the  true  TJR  <  0  dB.  False 
alarms  occur  when  p  <  po  and  the  true  TJR  >  0  dB.  Increasing  the  threshold  in¬ 
creases  the  rate  of  both  detections  and  false  alarms  (moving  to  the  right  on  the  ROC 
curve),  while  decreasing  the  threshold  decreases  the  rate  of  both  detections  and  false 
alarms  (moving  to  the  left  on  the  ROC  curve).  Any  factor  that  causes  the  measured 
correlation,  p,  to  decrease  will  have  the  same  effect  as  raising  the  threshold,  po,  and 
conversely,  anything  that  causes  the  measured  correlation  to  increase  will  have  the 
same  effect  as  lowering  the  threshold. 

The  most  striking  discrepancy  between  the  analysis  and  simulation  results  is  the 
relatively  low  false  alarm  rate  predicted  by  the  analysis.  This  results  from  the  approx¬ 
imation  in  (5.16)  used  for  the  target  pdf.  The  approximation  overrepresents  6t  —  0 
and  underrepresents  all  other  0  <  0t  <  9o-  This  causes  the  analysis  restdts  to  be 
biased  towards  Ot  =  0,  which  contributes  higher  correlation  values.  The  higher  corre¬ 
lation  values  cause  the  analysis  to  predict  lower  rates  of  detection  and  false  alarms. 
This  approximation  has  much  more  of  an  impact  on  the  false  alarm  rates,  because 
it  only  affects  the  target  signal’s  contribution  to  correlation,  and  false  alarms  occur 
when  the  target  signal  dominates  (TJR>  OdB). 

Figure  5.9  illustrates  the  effects  of  var3ring  bandwidth.  The  original  analysis  was 
performed  for  narrowband  sources.  Obviously,  using  wideband  signals  wiU  include 
additional  frequencies  in  the  computation  of  p.  The  effect  of  these  frequencies  depends 
on  the  source  angle.  For  a  rectangular  band  of  noise  arithmetically  centered  at  /  with 
bandwidth  B,  the  mean  correlation  value  is  given  by 

p  =  cos(27r/T)^^^^^— ^  (5.31) 

tvBt 

(McConnell,  1985).  Substituting  (5.2)  and  (5.30)  into  (5.31)  gives  the  relationship 
between  source  angle  and  correlation  for  kd  =  t:  for  arbitrary  bandwidth  as  a  fraction 
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theoretical  correlations  for  varying  fractional  bandwidth 


Figure  5-10:  Theoretical  relationship  between  correlation  and  source  angle,  for  frac¬ 
tional  bandwidths  as  indicated,  ranging  from  0.1  to  1.8. 


of  center  frequency,  that  is. 


cos(7r  sin(0)) 


sin(7r|  sin(0)) 
x|  sin(^) 


(5.32) 


where  b  is  the  fractional  bandwidth  y.  Figure  5.10  illustrates  this  relationship  for 
several  fractional  bandwidths.  In  Fig.  5.10  the  curve  for  b  =  0.1  is  indistinguishable 
from  the  pure  cosine  obtained  for  a  narrowband  source. 

From  (5.31)  and  Fig.  5.10  it  is  clear  that  the  effect  of  increasing  signal  bandwidth 
is  to  decrease  the  magnitude  of  the  correlation  value.  For  source  angles  less  than  30°, 
wideband  signals  provide  smaller  positive  values  while  for  source  angles  greater  than 
30°,  they  provide  smaller  negative  values.  Furthermore,  the  magnitude  of  this  effect 


107 


is  more  pronounced  for  jammers  (15®  —  90°)  than  for  targets  (0°  —  15°).  Therefore, 
the  effect  of  increasing  the  bandwidth  is  to  increase  the  correlation  measure,  leading 
to  lower  rates  of  detection  and  false  alarms.^ 

Another  factor  that  affects  the  performance  of  the  correlation  measure  is  the 
use  of  a  hard  limiter  to  compute  the  instantaneous  correlation.  For  Gaussian  sig¬ 
nals,  the  relationship  between  the  hard-Umited  correlation  and  the  true  correlation  is 
phi  =  f  arcsin(ptrue)  (Papoulis,  1984).  The  result  of  this  arcsine  transformation  is  to 
compress  the  relationship  between  TJR  and  p  near  p  =  0,  which  may  lead  to  more 
errors  in  both  directions,  that  is,  increased  false  alarms  and  decreased  detections. 
Figure  5.9  shows  the  results  of  simulations  with  and  without  hard  Hmiting.  Although 
the  effect  of  hard  limiting  is  to  increase  errors,  the  increase  is  slight,  and  for  many 
applications  may  be  worth  the  savings  in  computational  complexity.® 

Finally,  as  discussed  in  Sec.  5.2.3,  the  errors  are  not  distributed  evenly  as  a  function 
of  jammer  angle,  rather,  the  “misses”  are  concentrated  when  jammer  angle  is  small. 
The  right  half  of  Fig.  5.9  shows  Pd  and  P/  when  jammer  angles  between  18  and  34 
degrees  are  ignored.  For  the  range  of  jammer  angles  between  38  and  90  degrees, 
the  overall  performance  is  much  better,  with  Pd  ^  0.9,  and  only  a  slight  increase  in 
Pf.  To  illustrate  the  dependence  on  jammer  angle,  Fig.  5.11  shows  Pd  and  Pf  as  a 
function  of  jammer  angle  for  a  fractional  bandwidth  of  0.67.  The  four  curves  in  each 
panel  show  the  performance  for  the  four  target  angles,  6t  =  0,4,8,12'^.  Again  the 


^This  explains  the  general  trend  of  decreasing  Pd  and  Pf  with  increasing  bandwidth,  but  not  the 
initial  increase  seen  in  Fig.  5.9.  Because  the  detection  and  false  alarm  rates  described  by  (5.27)  and 
(5.28)  correspond  to  the  area  of  the  measurement’s  conditional  pdf  that  is  below  the  threshold,  those 
values  depend  on  the  entire  pdf,  not  just  its  mean.  For  the  fractional  bandwidths  studied  (0.1  to 
1.8),  the  variance  of  the  correlation  value  was  observed  to  decrease  with  increasing  bandwidth.  This 
is  because  the  wider  bandwidths  contain  more  frequencies,  making  the  result  less  sensitive  to  the 
fluctuations  of  individual  frequency  components.  For  some  conditions  of  target  angle,  jammer  angle, 
and  TJR,  as  the  bandwidth  increased,  this  secondary  effect  of  decreasing  variance  was  sufficient  to 
cause  increases  in  Pf  and  P^,  despite  the  increase  in  the  mean  of  the  correlation  measure.  This 
accounts  for  the  non-monotonicity  of  the  simulation  results  in  Fig.  5.9. 

®The  purpose  of  the  hard  limiter  is  to  replace  multiplies  with  sign-bit  comparisons  and  eliminate 
the  need  for  normalizing  the  correlation  measure.  However,  when  the  threshold  is  zero,  only  the  sign 
of  the  correlation  measure  is  relevant,  and  the  normalization  is  not  required,  since  it  will  not  affect 
the  sign  of  the  resulting  correlation  measure.  Therefore,  when  the  threshold  is  zero^  it  is  possible 
to  eliminate  the  hard  limiter^  especially  on  hardware  platforms  that  require  the  same  resources  for 
multiplication  as  for  sign-bit  comparison. 


108 


noise,  bandwidth  67%  of  center  frequency 


Figure  5-11:  Simulation  results  showing  rates  of  detection  and  false  alarms,  as  a 
function  of  jammer  angle,  for  noise  received  by  two  microphones  in  an  anechoic  en¬ 
vironment.  The  fractional  bandwidth  was  0.67,  and  results  are  shown  for  four  target 
angles,  6t  =  0,  4,  8,  and  12°  . 

TJR  varied  in  3  dB  steps  from  -19.5  dB  to  19.5  dB.  Figure  5.11  shows  that  there  is 
essentially  no  detection  of  TJR<  OdB  when  jammers  are  located  at  angles  less  than 
30  degrees  and  a  rapid  transition  in  the  detection  of  TJR<  OdB  when  the  jammer 
angle  changes  from  30°  to  40°.  Similarly,  the  false  alarm  rate  increases  with  increases 
in  either  target  angle  or  jammer  angle. 


5.3.2  Speech 

The  second  set  of  simulations  are  intended  to  investigate  the  effect  of  speech  sources 
and  reverberation.  These  simulations  included  both  two-  and  five-microphone  arrays. 
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The  target  and  jammer  sources  were  sentences  and  babble  prepared  as  described  in 
Sec.  3.1.  The  source  signals  were  scaled  so  that  the  long-term  TJR  was  0  dB. 

Two-microphone  array 

The  7-cm,  two-microphone  broadside  array  was  evaluated  using  three  sentences  and 
babble  for  each  of  the  76  combinations  of  target  and  jammer  angles.  All  of  the 
processing  was  the  same  as  for  the  noise  sources  in  the  previous  section.  The  hard 
limiting  was  included  in  the  processing. 

The  performance  was  evaluated  by  averaging  the  rates  of  detection  and  false 
alarms  for  all  conditions  and  comparing  to  the  true  TJR.  The  true  TJR  was  de¬ 
termined  by  squaring  the  target  and  jammer  signals  received  at  the  leftmost®  mi¬ 
crophone,  processing  the  squared  signals  with  the  same  10  ms  lowpass  filter  used  to 
smooth  the  correlation  measure,  and  computing  the  ratio  of  the  two  filtered  signals. 
The  values  of  Pj,  and  Py  were  computed  in  two  ways,  by  comparing  the  binary  deci¬ 
sion  to  the  true  broadband  TJR,  and  by  comparing  it  to  the  true  bandpass  TJR  in 
the  frequency  band  used  to  compute  the  correlation  measure. 

Figure  5.12  shows  the  results.  The  general  trends  in  performance  with  speech 
signals  follow  the  trends  seen  in  Fig.  5.9  with  noise.  As  would  be  expected,  the 
performance  is  better  (higher  detection  rate  and  lower  false  alarm  rate)  when  it  is 
referenced  to  the  TJR  in  the  frequency  band  used  by  the  correlation  measure  than 
when  it  is  referenced  to  the  broadband  TJR. 

Comparing  Fig.  5.12  with  Fig.  5.9  does  show  some  differences  in  the  results  ob¬ 
tained  with  speech  and  noise.  The  two  most  significant  differences  consist  of  a  drastic 
reduction  in  detection  rate  at  large  bandwidths  and  an  overall  increase  in  the  false 
alarm  rate. 

The  lower  detection  rates  for  large  fractional  bandwidths  is  due  to  the  difference 
between  noise  and  speech  source  signals.  Unlike  the  noise  used  in  the  previous  section, 
the  frequency  components  of  speech  are  not  evenly  distributed  within  the  band  of 
interest.  This  makes  the  performance  more  sensitive  to  increased  bandwidth,  and 

®when  facing  the  target 
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Figure  5-12:  Simulation  results  showing  rates  of  detection  and  false  alarms,  as  a 
function  of  bandwidth,  for  speech  and  babble  received  by  two  microphones  in  an  ane- 
choic  environment.  The  points  labeled  with  ‘x’  indicate  the  performance  referenced 
to  the  bandpass  TJR,  while  the  points  labeled  with  ‘o’  indicate  the  performance  ref¬ 
erenced  to  the  broadband  TJR.  The  points  labeled  with  correspond  to  the  values 
determined  by  the  narrowband  analysis  of  Sec.  5.2.3. 
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accounts  for  the  low  rate  of  detection  for  fractional  bandwidths  greater  than  unity. 

The  overall  increase  in  the  false  alarm  rate  is  due  to  the  range  of  TJRs  available  in 
the  signals.  For  the  noise  signals,  the  TJR  varied  in  3  dB  steps  that  included  -1.5  dB 
and  1.5  dB.  For  the  speech  signals,  the  short-term  TJR  varied  continuously  through  a 
range  of  TJRs  including  0  dB.  Many  of  the  false  alarms  occurred  on  these  transitions. 
This  can  be  seen  in  Fig,  5.13,  which  shows  the  true  TJRs  for  one  sentence  along  with 
the  locations  of  misses  and  false  alarms. 

In  order  to  investigate  the  usefulness  of  the  correlation  method  in  reverberation, 
a  subset  of  the  conditions  described  above  were  repeated  for  the  reverberant  source- 
to-microphone  impulse  responses  described  in  Sec.  3.2.  The  target  angle  was  either 
0  or  12  degrees  and  the  jammer  angle  was  38,  54,  70,  or  86  degrees.  For  the  eight 
combinations  of  target  and  jammer  angles,  the  rates  of  detection  and  false  alarm 
were  computed  as  described  above  for  the  same  three  sentences  and  babble.  For  the 
reverberant  conditions,  the  true  TJRs  used  to  assess  performance  were  computed 
from  the  direct  wave  of  target  and  jammer  signals;  reflections  were  not  included. 

The  residts  are  shown  in  Fig.  5.14.^^  The  addition  of  reverberation  lowers  the 
detection  rates  overall  and  causes  further  reduction  in  detection  rates  with  increasing 
bandwidth.  The  addition  of  reverberations  causes  no  distinct  trends  in  false  alarm 
rates. 

With  a  two-microphone  array,  only  one  correlation  measure  can  be  computed.  As 
a  result,  bandwidth  selection  is  governed  by  two  conflicting  goals.  On  one  hand,  it 
is  desirable  to  use  a  wide  frequency  band,  so  that  the  correlation  measure  will  reflect 
as  much  information  as  possible  about  the  broadband  TJR.  On  the  other  hand,  the 
use  of  relatively  wide  bandwidths  is  Hmited  by  their  negative  impact  on  detection 
rates.  This  tradeoff  can  be  seen  in  Figs.  5.12  and  5.14,  where  increasing  bandwidth 
causes  the  broadband  and  bandpass  results  to  converge,  but  also  causes  a  substantial 
reduction  in  detection  rates.  In  order  to  balance  these  two  conflicting  requirements, 
a  fractional  bandwidth  of  0.67  (one  octave)  will  be  used  with  two- microphone  arrays 

^^The  two  plots  on  the  left  contain  results  obtained  with  the  anechoic  room  impulse  responses 
for  the  same  eight  combinations  of  target  and  jammer  angles,  that  is,  the  values  were  obtained  by 
averaging  a  subset  of  the  data  points  that  were  averaged  to  generate  the  values  shown  in  Fig.  5.12. 
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Figure  5-14:  Simulation  results  showing  rates  of  detection  and  false  alarms,  as  a  func¬ 
tion  of  bandwidth,  for  speech  and  babble  received  by  two  microphones  in  anechoic 
and  reverberant  environments.  The  points  labeled  with  ‘x’  indicate  the  performance 
referenced  to  the  bandpass  TJR,  while  the  points  labeled  with  ‘o’  indicate  the  per¬ 
formance  referenced  to  the  broadband  TJR. 


spacing  (cm) 

desired  center 
frequency  (Hz) 

actual  frequency 
band  (Hz) 

4 

4313 

3000-5000 

8 

2156 

1700-3000 

12 

1438 

1250-1700 

16 

1078 

860-1250 

Table  5.1;  Frequency  bands  used  with  intermicrophone  spacings  obtained  from  a 
16-cm,  five-microphone  array. 

in  the  remainder  of  this  work. 

Five-microphone  array 

An  M-microphone  array  has  possible  pairs.  If  the  array  elements  are  evenly 

spaced,  then  those  pairs  represent  M  —  1  distinct  values  of  intermicrophone  spacing. 
For  a  five-microphone  array,  there  are  10  possible  pairs  with  4  distinct  spacings.  In 
contrast  to  the  two-microphone  case,  obtaining  adequate  frequency  coverage  is  not 
a  problem  because  of  the  variety  in  microphone  spacing.  Instead,  the  issue  is  to 
determine  the  best  method  for  combining  the  correlation  measurements  generated 
from  different  spacings  and  frequency  bands. 

The  simulated  five-microphone  array  was  16  cm  long,  with  4  cm  spacing  between 
microphones.  The  leftmost  microphone  was  paired  with  each  of  the  other  four  micro¬ 
phones  to  produce  four  smoothed  correlation  measures.  The  desired  center  frequency 
for  each  pair  was  determined  according  to  (5.30).  The  actual  cutoff  frequencies  were 
selected  to  provide  bands  roughly  centered  at  the  center  frequencies  without  overlap 
between  neighboring  bands.  The  values  used  are  given  in  Table  5.1. 

The  same  processing  used  for  the  two-microphone  array  was  applied  to  each  pair 
of  microphone  signals  to  produce  a  smoothed  correlation  measure.  Then  those  mea¬ 
sures  were  combined  using  each  of  the  three  methods  (voting,  averaging,  and  power- 
weighted  averaging)  described  in  Sec.  5.2.3.  The  results  obtained  using  those  three 
methods  were  compared  to  the  true  broadband  TJR  to  generate  rates  of  detection 
and  false  alarms.  For  the  voting  method,  ties  were  resolved  by  selecting  HI  (saying 
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speech,  jammer  1 8-90  speech,  jammer  38-90 


Figure  5-15:  Simulation  results  showing  rates  of  detection  and  false  alarms  for  speech 
and  babble  received  by  five  microphones  in  an  anechoic  environment.  Results  are 
shown  for  voting,  averaging,  and  power- weighted-averaging  methods  described  in  the 
text,  labeled  vote^  avg,  and  pavg^  respectively. 

TJR  >  0  dB),  resulting  in  lower  rates  of  both  detections  and  false  alarms. 

Figure  5.15  shows  the  results  for  the  5-microphone  array  in  an  anechoic  environ¬ 
ment.  The  values  are  averages  of  76  pairs  of  target /jammer  angles  and  three  sen¬ 
tences.  Comparing  the  three  methods  of  combining  correlation  measures,  the  voting 
and  power- weighted-averaging  methods  are  comparable,  while  the  averaging  method 
has  a  relatively  high  rate  of  false  alarms. 

Comparing  Fig.  5.15  to  Fig.  5.12  reveals  that  in  general,  using  four  bands  obtained 
from  five  microphones  outperforms  the  one  band  obtained  from  two  microphones. 
However,  that  improvement  is  modest  compared  to  the  more  than  four-fold  increase 


in  computational  requirements.  Future  investigations  could  quantify  the  incremental 
benefit  obtained  from  each  additional  pair  of  microphones  used  to  compute  correlation 
measures.  This  would  allow  the  system  designer  to  select  the  number  of  microphone 
pairs  and  frequency  bands  needed  for  a  particular  application,  trading  relatively  small 
reductions  in  performance  for  large  reductions  in  computational  complexity.  For 
example,  it  is  possible  that  combining  correlation  measures  from  two  microphone  pairs 
with  appropriate  spacing  could  provide  performance  comparable  to  that  obtained  with 
four  microphone  pairs  shown  in  Fig.  5.15,  at  half  the  computational  complexity. 

Figure  5.16  shows  results  in  reverberation  for  the  same  eight  pairs  of  target  and 
jammer  angles  considered  for  the  two-microphone  array.  The  performance  of  the  five- 
microphone  array  in  reverberation  is  consistent  with  the  trends  previously  observed. 
The  presence  of  reverberation  reduces  the  detection  rate,  with  no  clear  trends  in  the 
false  alarm  rate.  Again,  the  voting  and  power-weighted-averaging  methods  are  com¬ 
parable,  while  the  averaging  method  has  a  higher  rate  of  false  alarms.  Although  there 
appears  to  be  a  sHght  advantage  to  power-weighted  averaging,  it  requires  substan¬ 
tially  more  computation  than  the  voting  method.  Therefore,  the  remainder  of  this 
work  wiU  use  the  voting  method  for  combining  multiple  correlation  measures. 

5.4  Discussion 

The  purpose  of  this  chapter  is  to  investigate  using  quantities  derived  from  the  corre¬ 
lations  between  pairs  of  microphones  as  decision  variables  in  a  hypothesis  test  con¬ 
cerning  the  range  of  TJR.  The  proposed  method  was  analyzed  for  narrowband  signals 
with  and  without  reverberation,  and  then  implemented  and  evaluated  in  simulations 
with  both  noise  and  speech  signals.  Despite  many  violations  of  the  assumptions  used 
in  the  original  analysis,  the  simulations  show  that  the  narrowband  analysis  provides 
useful  insight  to  the  more  comphcated  case  of  broadband  speech  signals. 

The  analysis  with  narrowband  signals  revealed  that  the  binary  hypothesis  test 
produces  the  best  results  when  the  relationship  between  microphone  spacing  and 
frequency  is  governed  by  kd  =  x.  Furthermore,  the  uncertainties  introduced  by 
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Figure  5-16:  Simulation  results  showing  rates  of  detection  and  false  alarms  for  speech 
and  babble  received  by  five  microphones  in  anechoic  and  reverberant  environments. 
Results  are  shown  for  voting,  averaging,  and  power-weighted-averaging  methods  de¬ 
scribed  in  the  text,  labeled  vote,  avg,  and  pavg,  respectively. 
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varying  levels  of  reverberation  and  the  relative  costs  of  the  two  types  of  errors  indicates 
that  a  reasonable  choice  for  the  correlation  threshold  is  po  =  0. 

The  simtdations  considered  both  two-  and  five-microphone  arrays.  For  a  two- 
microphone  array,  only  one  correlation  measure  can  be  computed.  In  this  case,  the 
conflicting  demands  of  high  detection  rate  and  good  frequency  coverage  are  balanced 
by  selecting  an  intermediate  value  such  as  0.67  for  the  fractional  bandwidth.  For 
a  five-microphone  array,  the  multiple  correlation  measures  from  different  frequency 
bands  are  best  combined  by  using  the  voting  method,  so  as  to  balance  performance 
considerations  with  computational  demand. 

The  approach  proposed  and  evaluated  in  this  chapter  was  based  entirely  on  the 
correlation  coeflftcient  between  microphone  signals,  that  is,  on  the  correlation  with 
zero  time  lag.  This  was  motivated  by  the  need  to  develop  a  method  with  relatively 
low  computational  complexity.  However,  the  overall  approach  analyzed  and  evaluated 
in  this  chapter  could  easily  be  extended  to  utilize  values  derived  from  the  correlation 
function  at  nonzero  lags. 
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Chapter  6 


Effect  of  target  reflections 


6.1  Introduction 

The  LCMV  beamformer  assumes  that  the  target  direction  is  known  and  that  the 
target  and  jammer  signals  axe  uncorrelated.  As  discussed  in  Sec.  2.3.2,  target  sig¬ 
nal  reflections  violate  one  of  these  two  assumptions.  If  the  reflected  target  signal  is 
considered  target,  then  the  assumption  of  known  target  direction  is  violated.  On  the 
other  hand,  if  the  reflected  target  is  considered  jammer,  the  assumption  of  uncorre¬ 
lated  target  and  jammer  is  violated.  In  either  case,  the  result  is  that  the  reflections  of 
the  target  signal  arriving  from  directions  other  than  that  of  the  target  source  provide 
the  beamformer  with  information  that  may  be  used  to  cancel  the  target  signal. 

The  implications  of  a  system  that  cancels  reverberant  target  signals  must  be  con¬ 
sidered  from  the  point  of  view  of  speech  inteUigibihty.  Clearly,  if  a  system  cancels 
the  direct  wave  of  the  target  signal,  it  will  have  a  detrimental  effect  on  intelligibility. 
However,  if  the  system  cancels  reflections  of  the  target  signal,  its  effect  is  less  obvious. 
In  general,  early  reflections  contribute  to  inteUigibihty  while  late  reflections  degrade 
inteUigibUity,  where  the  distinction  between  early  and  late  reflections  is  50-95  ms  af¬ 
ter  the  direct  wave  (Cremer  and  MuUer,  1982).  Consequently,  a  system  that  cancels 
early  target  reflections  would  degrade  inteUigibihty,  whUe  a  system  that  cancels  late 
target  reflections  woiild  improve  inteUigibihty.  Unfortunately,  -within  the  structure  of 
the  generahzed  sidelobe  canceUer  there  is  no  way  to  distinguish  between  early  and 
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late  reflections. 

Although  the  ultimate  goal  of  this  work  is  to  improve  intelligibility,  which  requires 
consideration  of  early  and  late  target  reflections,  the  work  described  in  this  chapter 
takes  a  simpler  approach.  This  chapter  considers  the  effect  of  target  reflections  on  the 
performance  of  a  two-microphone  generalized  sidelobe  canceller,  but  the  results  also 
apply  to  adaptive  noise  cancellers  and  to  systems  with  any  number  of  microphones. 
The  performance  measures  are  the  gain  in  powers  of  the  direct  and  reflected  target 
signals. 

In  order  to  determine  the  effects  of  several  parameters  on  system  performance, 
simple  source-to-microphone  impulse  responses  are  created  to  account  for  a  small 
number  of  reflections.  In  addition,  system  performance  is  examined  for  the  simidated 
room  impulse  responses  described  in  Sec.  3.2. 


6.2  Background 

Figure  6.1  shows  a  block  diagram  of  the  system  considered  in  this  chapter.  It  is  based 
on  a  two-microphone  version  of  the  generalized  sidelobe  canceller  shown  in  Fig.  2.4. 
Unlike  that  system,  it  does  not  include  misadjustment  or  transient  behavior  due  to 
the  LMS  algorithm.  This  is  accompHshed  by  replacing  the  adaptive  filter  weights 
by  their  optimal  values.  Furthermore,  the  environment  differs  in  that  no  jammer 
source  is  present,  because  the  target-only  condition  provides  a  worst-case  assessment 
of  target  cancellation. 

The  target  source,  t,(n),  is  filtered  by  two  source-to-microphone  room  impulse 
responses,  hto{n)  and  hti{n).  The  two  microphone  signals  axe  added  and  subtracted 
to  produce  the  primary  and  reference  signals,  respectively.  The  primary  signal  is 

^i'^)  =  *  i^to{n)  +  ha{n)))  (6.1) 
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Figure  6-1:  Block  diagram  of  target-only  system  with  optimal  weights.  The  input 
is  the  target  source,  which  is  filtered  by  two  source-to-microphone  room  impulse 
responses,  hto{n)  and  hti{n).  The  two  microphone  signals  are  added  and  subtracted  to 
produce  the  primary  and  reference  signals,  z{n)  and  ®(n),  respectively.  The  reference 
signal  is  filtered  by  the  optimal  weights,  w*,  determined  according  to  (6.3)  in  the 
text.  The  result  is  subtracted  from  the  primary  signal,  delayed  by  D  samples,  to 
produce  the  system  output. 

and  the  reference  signal  is 


®(w)  =  *  (^to(7i)  -  hn{n))).  (6.2) 

The  reference  signal  is  filtered  by  the  optimal  weights,  which  are  determined  according 
to  (4.11), 

w*  =  R-^p,  (6.3) 

where  R  is  the  autocorrelation  matrix  of  the  reference  signal  in  the  tapped  delay 
bne  of  the  adaptive  filter  and  p  is  the  cross-correlation  vector  between  the  delayed 
primary  signal  and  the  reference  signal  in  the  tapped  delay  Kne.  The  filtered  reference 
signal  is  subtracted  from  the  primary  signal,  delayed  by  D  samples,  to  produce  the 
system  output. 

If  the  target  source,  tg{n)  is  stationary,  zero-mean  white  noise,  then  the  autocor¬ 
relation  matrix,  R,  and  the  cross-correlation  vector,  p,  can  be  determined  from  the 
impulse  responses  hto(^i)  and  hti{n),  the  primary  delay,  D,  and  the  filter  length,  L, 
after  Zurek  et  al.  (1990).  This  can  be  seen  as  follows.  The  autocorrelation  matrix,  R, 
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is  the  L  X  L  symmetric  Toeplitz  matrix  with  entries  in  the  ith  row  and  jth  column 
given  by 

Rij=r{i-j)  i  =  j  =  l,...,L,  (6.4) 

where  r{k)  is  the  autocorrelation  function  of  the  reference  signal, 

r(k)  =  E[x{n)x{n  —  A;)].  (6-5) 

Similarly,  the  cross-correlation  vector,  p,  is  the  L  x  1  vector  with  the  ith  entry  given 

by 

Pi  =  p(i-1-D)  i  =  l,  (6.6) 

where  p{k)  is  the  cross-correlation  function  between  the  primary  and  reference  signals, 

p{k)  =  E[z{n)x{n  —  k)].  (6-7) 

Substituting  (6.1)  and  (6.2)  into  (6.5)  and  (6.7)  and  rearranging  produces 

r{k)  =  ^E[(tg{n)  *  {hto{n)  -  hti{n)))(t,{n)  *  (hfo(n  —  k)  -  hti(n  —  fe)))] 

=  ^rt{k)  *  (htoik)  -  hn{k))  *  {hto{-k)  -  hn{-k))  (6.8) 

and 

p{k)  —  ^E[{U{n)  *  {hto{n)  +  hti(n)))(t,(n)  *  {hto{n  -  k)  -  hti{n  -  A;)))] 

=  \rt{k)  *  {hto{k)  +  hn(k))  *  {hto{-k)  -  hai-k))  (6.9) 

where  rt{k)  is  the  autocorrelation  function  of  the  target  source,  that  is, 

rt(A!)  =  E[tg{n)t,{n  -  k)].  (6.10) 

If  tg[n)  is  zero-mean  white  noise,  then  rt(A!)  =  S{k)  (the  unit  impulse),  and  the 
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functions  r(fc)  and  p{k)  become 


r{k)  =  ^{htoik)  -  hn{k))  *  ihoi-k)  -  h^^{-k)) 

(6.11) 

and 

p{k)  =  \ihtoik)  +  ha{k))  *  {hto{-k)  -  hn{-k)). 

(6.12) 

These  expressions  depend  only  on  the  source-to-microphone  impulse  responses,  so, 
if  the  target  source  is  zero-mean  white  noise,  then  the  optimal  weights,  w*,  can  be 

computed  from  hto{n),  hii{n),  L,  and  D  using  (6.3),  (6.4),  (6.6),  (6.11)  and  (6.12). 

Assuming  the  source-to-microphone  impulse  responses  hto{n)  and  hti(n)  consist 
of  a  direct  wave  that  is  equal  in  the  two  microphones,  and  reflections  that  differ  in 
the  two  microphones,  their  difference,  hto{n)  —  depends  only  on  the  reflections. 

F.Ya.mining  (6.11)  and  (6.12)  reveals  that  the  autocorrelation  function  r{k)  consists 
only  of  terms  related  to  the  reflections,  while  the  cross-correlation  function  p{k)  con¬ 
sists  of  terms  related  to  both  the  direct  wave  and  the  reflections.  The  cross-correlation 
function  p{k)  can  be  considered  the  sum  of  two  components,  the  terms  due  only  to 
the  reflections,  and  those  due  to  the  direct  wave  and  reflections.^  Since  the  cross¬ 
correlation  vector  p  consists  only  of  the  terms  of  p{i  —  1  —  D)  for  i  —  the 

choice  of  D  determines  whether  or  not  the  terms  of  p{k)  including  information  about 
the  direct  wave  are  included  in  the  optimal  weights,  w*. 

In  particular,  examining  (6.6)  and  (6.12)  reveals  that  if  no  reflections  occur  within 
D  samples  of  the  direct  wave,  then  p,  and  consequently  w*,  depend  only  on  the 
reflections.  This  has  important  imphcations  for  the  performance  of  the  system  in 
the  presence  of  target  reflections.  If  D  is  less  than  the  interval  between  the  direct 
wave  and  the  first  reflection,  the  optimal  weights  will  be  the  same  as  if  no  direct 
wave  were  present,  and  no  cancellation  of  the  direct  wave  can  occur,  despite  the 
fact  that  the  target  reflections  violate  one  of  the  two  basic  assumptions  of  LCMV 
beamformers.  This  is  an  extension  of  the  observation  that  target  cancellation  due  to 
target  reflections  can  be  avoided  by  setting  D  =  0  (Hoffman  et  al.,  1994). 

^Examples  of  this  decomposition  are  given  in  Sec.  6.5.2. 


124 


The  analysis  leading  to  the  important  result  in  the  previous  paragraph  was  based 
on  two  assumptions,  that  the  direct  wave  is  equal  in  the  two  microphones  and  that 
the  source  is  zero-mean  white  noise.  Violation  of  the  first  assumption  subjects  the 
direct  wave  to  cancellation  due  to  misalignment,  which  is  a  problem  regardless  of  the 
level  of  reverberation  and  supercedes  the  problem  of  cancellation  due  to  reflections. 
(MaMng  adaptive  arrays  robust  to  misalignment  was  a  major  motivation  behind  the 
modifications  proposed  in  Chs.  4  and  5.)  Violation  of  the  second  assumption  com¬ 
plicates  the  analysis  because  the  optimal  weights,  w*,  cannot  be  determined  solely 
from  the  source-to-microphone  impulse  responses  and  the  parameters  L  and  D.  In 
this  case,  the  optimal  weights  will  also  depend  on  the  autocorrelation  of  the  source 
signal,  as  indicated  in  (6.8)  and  (6.9).  When  the  target  signal  is  voiced  speech  (which 
is  correlated  for  short  lags),  the  above  result  must  be  modified  to  state  that  no  sig¬ 
nificant  cancellation  of  the  direct  wave  can  occur  when  D  is  less  than  the  sum  of 
two  quantities  —  the  interval  between  the  direct  wave  and  the  first  reflection  and  the 
maximum  lag  for  which  the  source  exhibits  substantial  correlation. 


6.3  Methods 

In  order  to  study  the  effect  of  tajget  signal  reflections,  several  source-to-microphone 
room  impulse  responses  are  used.  Two  simple  types  of  impulse  responses  are  a  direct 
wave  with  a  single  reflection  and  a  direct  wave  with  two  reflections.  In  both  cases, 
the  direct  wave  is  aligned  in  time  and  of  equal  ampHtude  in  the  two  microphones, 
simulating  a  perfectly  aligned  array. 

The  single-reflection  impulse  responses  consist  of  a  direct  wave  of  unit  amphtude 
followed  by  a  single  reflection  arriving  at  the  microphones  with  a  relative  delay  of 
one  sample.  The  purpose  of  the  relative  delay  is  to  make  the  reflection  appear  to 
arrive  from  a  direction  other  than  straight  ahead.^  This  is  similar  to  the  simple  case 
analyzed  by  Lu  and  Clarkson  (1993)  for  an  adaptive  noise  canceller.  This  case  is 

relative  delay  of  one  sample  makes  the  reflection  appear  to  arrive  from  the  azimuthal  angle 
6  =  arcsin(^),  where  c  is  the  speed  of  sound,  Ts  is  the  sampling  period,  and  d  is  the  spacing 
between  microphones. 
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described  by  the  equations 


hto{n)  =  S{n)  +  aS(n  —  d\  —  l)  (6.13) 

hti{n)  =  S(n)  +  aS{n  —  di).  (6-14) 

An  example  of  single-reflection  impulse  responses  is  illustrated  in  Fig.  6.2(a)  for  di  = 
50  and  a  =  0.2. 

The  two-reflection  impulse  responses  consist  of  a  direct  wave  of  unit  amplitude 
followed  by  two  reflections  of  the  same  amplitude  but  with  different  relative  delays. 
This  case  is  described  by  the  equations 

hto{n)  =  6{n)  +  aS{n  —  di  —  1)  -f  a6{n  —  d2  +  2)  (6.15) 

hti{n)  =  5(n)  -f  aS{n  —  di)  -t-  aS{n  —  d2).  (6.16) 

An  example  of  two-reflection  impulse  responses  is  illustrated  in  Fig.  6.2(b)  for  di  =  50, 
d2  =  70  and  a  =  0.2. 

In  addition  to  the  simple  impulse  response  described  above,  two  additional  pairs 
of  impulse  responses  were  studied.  These  impulse  responses  represent  moderately 
reverberant  and  strongly  reverberant  rooms  emd  are  shown  in  Figs.  6.2(c)  and  6.2(d). 
They  were  generated  by  the  room  simulation  as  described  in  Sec.  3.2. 

For  each  pair  of  source-to-microphone  impulse  responses,  the  source-to-system- 
output  impulse  response  was  computed  for  the  system  shown  in  Fig.  6.1.  The  filter 
length  was  L  —  1000  and  the  primary  delay  varied  from  U  =  0  to  D  =  990  for  each 
condition.  The  target  source  was  assumed  to  be  stationary  zero-mean  white  noise.  R 
and  p  were  determined  from  (6.4),  (6.6),  (6.11)  and  (6.12),  and  the  optimal  weights 
were  computed  according  to  (6.3)  using  the  Levinson  algorithm  (Golub  and  van  Loan, 
1983). 

To  assess  the  performance  of  the  system  for  each  condition,  the  source-to-system- 
output  impulse  response  was  compared  to  the  source-to-primary  impulse  response. 
For  each  of  these  impulse  responses  the  powers  of  the  direct  and  reflected  portions  of 
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Figure  6-2:  Examples  o£  source-to-microplione  impulse  responses  described  in  the 
text,  (a)  Single-reflection  impulse  responses  described  by  (6.13)  and  (6.14),  (b) 
two-reflection  impulse  responses  described  by  (6.15)  and  (6.16),  (c)  simulated  room 
impulse  responses  described  in  Sec.  3.2  with  wall  absorption  of  0.6  (moderate  rever¬ 
beration),  and  (d)  simulated  room  impulse  responses  described  in  Sec.  3.2  with  wall 
absorption  of  0.2  (strong  reverberation). 
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the  impulse  response  were  calculated.  The  direct  power  was  based  on  the  power  in 
the  five  samples  of  the  impulse  response  surrounding  the  known  index  of  the  direct 
wave,  while  the  reflected  power  was  calculated  based  on  the  power  in  the  remainder  of 
the  impulse  response.  The  gain  in  direct  wave  power  due  to  the  system  is  computed 
from  the  ratio  of  direct  wave  power  in  the  system  output  to  direct  wave  power  in  the 
primary  channel.  This  quantity  is  0  dB  when  no  direct  wave  cancellation  occurs  and 
negative  when  cancellation  of  the  direct  wave  occurs.  The  gain  in  reflected  power 
due  to  the  system  is  calculated  in  a  similar  fashion,  and  negative  values  of  the  gain 
indicate  attenuation  of  the  target  reflections.®  These  measures,  based  on  the  powers  in 
segments  of  the  impulse  responses,  differ  from  AT{Tdi)  and  Ar(rr)  defined  in  Sec.  3.3 
and,  unlike  those  measures,  provide  an  accurate  indication  of  the  effect  of  the  system 
even  when  the  output  includes  cancellation  of  direct  target  based  on  target  reflections. 


6.4  Results 

Figures  6.3  and  6.4  show  the  gains  in  direct  and  reflected  powers  for  the  four  pairs  of 
impulse  responses.  The  soHd  lines  in  the  left  half  of  Fig.  6.3  show  the  gains  in  direct 
and  reflected  target  powers,  as  a  function  of  primary  delay  D,  for  the  case  of  a  single 
reflection  described  by  (6.13)  and  (6.14)  with  a  =  0.2  and  di  =  50.  As  expected  from 
the  analysis  in  Sec.  6.2,  when  D  <  di  the  direct  wave  is  preserved  and  the  system 
provides  substantial  cancellation  of  the  reflection.  When  D  >  di,  the  system  cancels 
both  the  direct  wave  and  the  reflection.  The  cancellation  of  the  reflection  is  reduced 
because  the  system  minimizes  the  total  output  power,  and  since  the  direct  wave 
contributes  more  to  the  total  output  power,  that  quantity  is  minimized  by  providing 
more  cancellation  of  the  direct  wave  and  less  cancellation  of  the  reflection.  Except 
for  the  large  change  that  occurs  ai  D  =  di,  the  system  performance  is  independent 
ofD. 

The  solid  Hnes  in  the  right  half  of  Fig.  6.3  show  the  gains  in  direct  and  reflected 

^Note  that  these  gains  are  based  on  a  different  reference  than  in  the  remainder  of  this  work, 
where  the  gain  is  relative  to  a  single  microphone  signal.  Here  the  gain  is  relative  to  the  primary 
signal,  so  that  it  does  not  include  attenuation  of  reflections  due  to  averaging  the  microphone  signals. 
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target  powers,  as  a  function  of  primary  delay  D,  for  the  case  of  two  reflections  de¬ 
scribed  by  (6.15)  and  (6.16)  with  a  =  0.2,  di  =  50  and  ^2  =  70.  As  expected,  when 
D  <  di  the  direct  wave  is  preserved,  and  when  D  >  di,  both  the  direct  wave  and 
the  reflections  are  cancelled  to  some  degree.  Unlike  the  case  of  a  single  reflection,  the 
degree  of  cancellation  varies  with  D,  in  addition  to  the  large  change  that  occurs  at 
D  —  d\.  The  cause  of  these  variations  will  be  investigated  in  Sec.  6.5.2. 

Figure  6.4  shows  the  gain  in  direct  and  reflected  target  powers,  as  a  function  of 
primaiy  delay  D,  for  the  simulated  room  impulse  responses  shown  in  Figs.  6.2(c)  and 
6.2(d).  The  general  trends  in  these  results  are  similar  to  those  seen  in  Fig.  6.3  with 
the  simple  room  impulse  responses.  For  both  pairs  of  impulse  responses,  the  first 
reflection  occurs  46  samples  after  the  direct  wave,  but  it  is  aligned  in  time  at  the  two 
microphones.  The  second  reflection  occurs  66  samples  after  the  direct  wave,  and  is 
not  aligned  at  the  two  microphones.  As  expected,  when  D  <  66  the  direct  wave  is 
preserved  and  only  the  reflections  are  cancelled.  When  D  >  66  both  the  direct  wave 
and  the  reflections  axe  cancelled. 

The  dotted  lines  in  Figs.  6.3  and  6.4  show  the  effect  of  the  system  on  the  reflections 
in  the  absence  of  the  direct  wave.  Because  the  reflections  are  signals  arriving  at  the 
array  from  off-axis  directions,  the  ability  of  the  system  to  cancel  target  reflections  is 
also  indicative  of  its  ability  to  cancel  off-axis  jammers.  The  dotted  lines  in  Figs.  6.3 
and  6.4  show  that  maximum  cancellation  of  off-axis  sources  is  obtained  for  a  broad 
range  of  values  of  D  >  0.  Overall,  the  results  shown  in  Figs.  6.3  and  6.4  show  that 
there  is  a  clear  advantage  to  using  values  of  D  substantially  greater  than  zero,  but 
less  than  the  delay  to  the  first  reflection. 

Figure  6.5  shows  the  source-to-delayed-primary  and  source-to-system-output  im¬ 
pulse  responses  for  the  simulated  room  with  strong  reverberation  (Fig.  6.2(d)),  for 
D  =  50  and  for  D  =  500.  Clearly,  for  D  =  50  the  direct  wave  is  preserved,  and  some 
reflections  axe  noticeably  attenuated.  For  D  =  500,  both  the  direct  wave  and  the 
reflections  axe  attenuated. 

In  order  to  investigate  the  variations  in  performance  that  occur  with  changes  in 
the  direct-to-reverberant  ratio,  modified  source-to- microphone  impulse  responses  were 
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Figure  6-3;  Gain  of  direct  target  and  target  reflections  as  a  function  of  primary  delay 
for  impulse  responses  with  a  single- reflection  (left)  and  with  two  reflections  (right). 
Sohd  lines  indicate  results  when  both  the  direct  and  reflected  components  are  present; 
dotted  lines  indicate  results  for  reflected  components  alone. 
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Figure  6-4:  Gain  of  direct  target  and  target  reflections  as  a  function  of  primary  delay 
for  simulated  room  impulse  responses  with  moderate  (left)  and  strong  (right)  rever¬ 
beration.  Solid  lines  indicate  results  when  both  the  direct  and  reflected  components 
are  present;  dotted  lines  indicate  results  for  reflected  components  alone. 


131 


source-to-system  output  source-to-delayed  primary 


D  =  50 


D  =  500 


sample  number  sample  number 

Figure  6-5:  Source-to-delayed  primary  impulse  responses  for  simulated  room  impulse 
responses  with  moderate  reverberation  for  primary  delays  of  (a)  £)  =  50  and  (b) 
J)  =  500.  Source-to-system  output  impulse  responses  for  simulated  room  impulse 
responses  with  strong  reverberation  for  primary  delays  of  (c)  Z)  =  50  and  (d)  D  =  500. 
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developed  based  on  the  simulated  impulse  responses  of  the  moderately  reverberant 
room  (Fig.  6.2(c)).  Those  impulse  responses  were  separated  into  components  consist¬ 
ing  of  either  the  direct  wave  or  the  reflections,  and  the  direct-wave  component  was 
scaled  before  recombining  with  the  reflections  to  produce  impulse  responses  with  the 
same  structure,  but  different  direct-to-reverberant  ratios.  Figure  6.6  shows  the  gain 
in  direct  and  reflected  target  powers  for  these  modified  pairs  of  impulse  responses. 
Because  the  optimal  weights  axe  selected  to  minimize  the  total  output  power,  the 
results  show  relatively  more  cancellation  of  the  stronger  component,  that  is,  more 
cancellation  of  reflections  at  the  lower  direct-to-reverberant  ratio  (dotted  lines)  and 
more  cancellation  of  the  direct  wave  at  a  high  direct-to-reverberant  ratio  (solid  hnes). 
At  the  high  direct-to-reverberant  ratio,  for  some  values  of  D,  the  system’s  attempt 
to  minimize  the  stronger  direct  component  results  in  amphfication  of  the  reflections 
(gain  greater  than  0  dB). 

In  addition  to  the  results  shown  here,  the  gain  in  direct  and  reflected  powers  was 
computed  for  other  impulse  responses  and  for  the  addition  of  uncorrelated  sensor 
noise  in  the  two  microphone  signals.  The  general  trends  observed  under  a  variety  of 
conditions  were  similar  to  the  results  shown  above. 


6.5  Discussion 

6.5.1  Summary 

The  restdts  of  the  previous  section  suggest  that  a  simple  solution  to  the  problem 
presented  by  reverberant  target  is  to  set  the  primary  delay  to  zero,  a£  suggested  by 
Hoffman  et  al.  (1994).  However,  the  results  also  show  that,  in  general,  the  cancellation 
of  target  reflections  (and  therefore  cancellation  of  jammer  signals)  improves  with 
increasing  D.  This  improvement  is  because  nonzero  primary  delays  permit  non- 
causal  responses  on  the  part  of  the  adaptive  filter,  which  was  the  initial  motivation 
for  including  a  primary  channel  delay  (Widrow  and  Stearns,  1985).  As  a  result,  it  is 
advisable  to  use  a  primary  delay  that  is  nonzero,  but  relatively  small. 
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Ideally,  the  primary  delay  should  always  be  less  than  the  number  of  samples 
between  the  direct  wave  and  the  first  reflection.  However,  since  these  systems  must 
operate  in  a  variety  of  acoustic  environments,  it  is  Ukely  that,  if  D  is  nonzero,  on 
some  occasions  early  reflections  will  arrive  at  the  array  within  D  samples  of  the 
direct  wave.  Fortunately,  when  there  are  multiple  reflections,  the  system  is  robust  to 
some  reflections  within  D  samples  of  the  direct  wave.  This  robustness  exists  provided 
that  there  are  additional  reflections  that  arrive  more  that  D  samples  within  the 
direct  wave.  For  example,  there  is  relatively  httle  cancellation  of  the  direct  wave  in 
Fig.  6.4  for  66  <  D  <  120.  Those  figures  illustrate  the  worst  case  scenario  for  target 
cancellation,  because  no  jammer  signal  is  present  and  the  filter  weights  are  optimum. 

Since  the  simulated  room  impulse  responses  have  a  10  kHz  sampling  rate,  reflec¬ 
tions  arriving  at  the  microphones  50  -  66  samples  after  the  direct  wave  corresponds 
to  delays  of  5.0  -  6.6  ms,  or  distances  of  roughly  2  meters  at  the  speed  of  sound. 
Although  earher  reflections  can  be  expected  in  real  rooms  with  furnishings,  they  will 
be  few  relative  to  the  total  reflections,  and  are  not  expected  to  change  the  general 
trends  seen  in  Fig.  6.4. 

Kompis  and  DiDier  (1991)  empirically  optimized  the  delay,  D,  where  the  optimal 
parameter  value  was  defined  as  that  which  provided  the  largest  gain  in  TJR.  They 
varied  D  for  a  single  target /jammer  configuration  with  various  filter  lengths  in  several 
reverberant  rooms.  They  conclude  that  for  L  <  128,  the  value  of  the  D  has  no  great 
influence  on  performance,  while  for  L  >  512  the  ‘optimal’  delay  is  25-50%  of  the 
filter  length.  For  the  rooms  that  they  studied,  the  time  between  the  direct  wave  aud 
first  reflection  is  not  known.  However,  their  results  axe  consistent  with  the  current 
analysis.  For  relatively  short  filters,  the  delay  does  not  matter  because  no  reflections 
arrive  within  L  samples  of  the  direct  wave.  For  relatively  long  filters,  their  suggested 
range  of  ^  <  £)  <  j  presumably  results  from  the  tradeoff  between  two  factors  that 
improve  performance:  relatively  short  D  minimizes  the  number  of  reflections  axriving 
within  D  samples  of  the  direct  wave  and  D  —  ^  centers  the  non-causal  impulse 
response. 

Finally,  to  characterize  the  robustness  of  the  system  to  some  reflections  within  D 
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samples  of  the  direct  wave,  it  might  be  useful  to  quantify  the  relationship  between 
the  degree  of  direct  wave  cancellation  and  some  measure  of  the  reflections  occurring 
within  D  samples  of  the  direct  wave  and  more  than  D  samples  after  the  direct  wave. 
However,  the  following  analysis  shows  that  for  room  impulse  responses  with  more 
than  one  reflection,  other  factors  influence  the  relationship  between  D  and  system 
performance,  and  the  performance  cannot  be  explained  solely  on  the  basis  of  the 
reflections  arriving  within  D  samples  of  the  direct  wave. 

6.5.2  Analysis 

The  lack  of  direct  wave  cancellation  when  D  is  less  than  the  time  between  the  direct 
wave  zind  the  first  reflection  was  predicted  in  Sec.  6.2  and  demonstrated  in  Sec.  6.4. 
Here,  that  result  is  analyzed  for  the  cases  of  one  and  two  reflections  described  in 
Sec.  6.3. 

First,  it  will  be  useful  to  define  some  additional  quantities.  In  Sec.  6.2,  it  was 
shown  that  if  the  direct  wave  is  perfectly  ciligned  in  the  two  channels,  the  auto¬ 
correlation  function  3Jid  consequently  R,  consist  only  of  terms  related  to  the 

reflections,  while  the  cross-correlation  function  p{k)  and  vector  p  consists  of  terms 
related  to  both  the  direct  wave  and  the  reflections.  It  will  be  useful  to  decompose 
the  function  p{k)  into  a  component  that  depends  only  on  the  reflections  (pl(A))  and 
a  component  that  depends  on  both  the  direct  wave  and  the  reflections  {p2{k))  by 
defining 

p{k)  =  pl{k)  -1-  p2{k).  (6-17) 

Then 

p  =  pl-|-p2,  (6.18) 

where  pi  and  p2  are  i  X  1  dimensional  vectors  based  on  pl(fe)  and  p2{k)  as  in  (6.6). 
Using  (6.3),  the  optimal  weights  also  consist  of  two  components,  that  is, 

w*  =  R"^p  =  R“^pl  -f  R"^p2  =  w*  -I-  wl,  (6.19) 
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where  the  weight  vector  is  only  based  on  the  reflections.  In  fact,  the  vector  wj 
is  equivalent  to  the  weights  that  would  result  if  the  direct  wave  were  absent  and  the 
source-to-microphone  impulse  responses  consisted  solely  of  off-axis  reflections. 

In  addition  to  the  fact  that  R  depends  only  on  the  reflections,  the  matrix  R 
determined  according  to  (6.4)  and  (6.11)  does  not  depend  on  the  primary  delay,  D. 
On  the  other  hand,  the  cross-correlation  vector,  p,  determined  according  to  (6.6)  and 
(6.12)  does  depends  on  the  direct  wave  and  D. 

For  the  case  of  a  single  reflection,  substituting  (6.13)  and  (6.14)  into  (6.12)  gives 

p{k)  =  ^{2S{k)  +  aS{k  -  dr  -  1)  +  aS{k  -  di)) 

^(^cbS(^k  “i"  -|-  1)  —  (iS{^k  -f*  c^i)) 

=  — -f-  -|- 1)  —  ^S(^k  -|-  +  cLS(^k  +  1)  —  Q,S{^k  —  I))?  (6.20) 

wliicli  can  be  separated  into  the  two  terms 

2 

pl{k)  =  j{6{k  +  1)-  S{k  -  1))  (6.21) 

and 

p2(k)  =  ^{S{k  -f  di  +  1)  -  S{k  -b  dr)).  (6.22) 

For  any  value  of  D  in  the  range  0  <  Z)  <  L  —  1,  pi  has  two  nonzero  entries  that 
occur  in  positions  D  and  D  +  2.  li  0  <  D  <  dr,  then  p2  =  0  and  w*  =  (the 
optimal  weights  for  cancelling  reflections  alone).  If  D  >  di,  then  p2  is  nonzero,  and 
the  resulting  weights  depend  on  (and  cancel)  both  the  direct  wave  and  the  reflections. 

Therefore,  for  the  case  of  a  single  reflection,  the  dependence  of  performance  on 
the  primary  delay  D  observed  in  the  left  half  of  Fig.  6.3  is  completely  explained  by 
whether  or  not  a  particular  value  of  D  results  in  a  cross-correlation  vector  p  that 
includes  nonzero  elements  of  p2(^k),  which  contain  information  about  the  correlation 
between  the  direct  wave  and  the  reflections.  However,  this  result  does  not  generalize 
to  the  case  of  two  or  more  reflections,  as  shown  below. 

For  the  case  of  two  reflections,  separating  the  direct  wave  from  the  reflections  and 
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substituting  (6.15)  and  (6.16)  into  (6.12)  gives 

2 

pl{k)  =  +  di  -  di)  -  S{k  +  dz  -  -  1)  (6.23) 

“h  ci2  —  di  —  2)  -j-  -|-  (^2  —  di  —  3) 

— 5(n  +  2)  +  S{n  +  1)  “  S{n  1)  +  S{n  —  2) 

“h  “1“  3)  —  6(^k  —  ^2  “h  H"  2) 

+^(fc  —  (^2  +  <^1  +  1)  “  ~  c?2  +  ^^i)) 

and 

P2(fe)  =  li-S{k  +  da)  +  S{k  +  d2  -  2)  +  5(fe  +  di  +  1)  -  ^(A:  +  di))  (6.24) 

The  twelve  terms  in  (6.23)  are  clustered  into  three  groups  of  four  impulses.  For 
any  value  of  D  in  the  range  l<D<L-d2  +  di  —  1,  pi  has  at  least  eight  nonzero 
entries,  corresponding  to  the  last  eight  terms  in  (6.23).  If  I?  >  ^2  then  all  twelve 
terms  of  (6.23)  occur  as  nonzero  entries  in  pi.  Therefore,  it  is  expected  that  the 
system  wiU  show  a  change  in  the  performance  near  D  =  d2  —  di^ 

The  four  terms  in  (6.24)  comprise  two  groups  of  two  impulses.  If  0  <  jD  <  di, 
then  p2  =  0  and  w*  =  (the  optimal  weights  for  canceUing  reflections  alone).  If 
di  <  Z)  <  d2  3  then  p2  contains  two  nonzero  terms,  and  the  resulting  weights  depend 
on  (and  cancel)  both  the  direct  wave  and  the  reflections.  If  D  >  d2  +  1,  then  p2 
contains  four  nonzero  terms,  and  the  resulting  weights  will  cancel  the  direct  wave 
more  effectively  than  in  the  previous  case. 

For  the  case  of  di  =  50  and  d2  =  70,  the  above  analysis  predicts  an  improvement 
in  cancellation  of  reflections  near  D  =  d2  —  di  =  20,  no  direct  cancellation  for  D  < 
50,  some  direct  wave  cancellation  when  50  <  jD  <  70,  and  increased  direct  wave 
cancellation  when  D  >  70.  The  performance  observed  in  the  right  half  of  Fig.  6.3  is 

'^When  c^2  ^  3  <  D  <  ^2  —  j  pi  has  between  nine  and  eleven  nonzero  entries.  For  simplicity, 
when  impulsive  terms  of  cross-correlation  functions  occur  in  clusters,  the  details  of  such  transitions 
will  be  ignored.  Hence  the  change  in  performance  is  expected  near,  but  not  exactly  at,  D  =  d2  —  di. 
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in  accordance  with  these  predictions. 

However,  the  right  half  of  Fig.  6.3  shows  that  there  are  additional  changes  in 
performance  of  magnitude  similar  to  those  predicted  by  the  analysis,  occurring  at 
=  40  and  at  intervals  of  20  starting  with  D  =  90.  This  dependence  of  performance 
on  D  cannot  be  attributed  to  the  inclusion  of  additional  terms  in  pi  and  p2.  Rather, 
it  is  due  to  the  fact  that  varying  D  affects  the  position  of  the  nonzero  entries  in  pi 
and  p2,  which  determines  which  columns  of  the  matrix  are  used  to  produce  the 
filter  weights.  The  structure  of  R“^  depends  only  on  the  reflections,  and  the  filter 
length  L,  not  on  the  direct  wave  or  the  primary  delay  D. 

Therefore,  this  section  concludes  with  the  observation  that,  although  the  presence 
or  absence  of  direct  target  cancellation  can  be  predicted  based  on  the  presence  or 
absence  of  reflections  within  D  samples  of  the  direct  wave,  it  is  not  possible  to  predict 
other  general  variations  in  performance  with  D  when  multiple  reflections  exist. 

6.6  Conclusion 

The  effect  of  reflections  of  the  desired  target  source  on  the  performance  of  a  two- 
microphone  generalized  sidelobe  canceller  was  studied.  Simple  source-to-microphone 
impulse  responses  were  created  to  account  for  a  small  number  of  reflections.  In  addi¬ 
tion,  performance  was  studied  using  the  simulated  room  impulse  responses  described 
in  Sec.  3.2. 

The  results  show  that  the  primary  channel  delay,  D,  has  a  large  impact  on  the 
system’s  ability  to  cancel  the  direct  target  based  on  target  reflections.  Direct  target 
cancellation  is  eliminated  entirely  when  the  primary  delay  is  shorter  than  the  interval 
between  the  arrival  of  the  direct  wave  and  the  arrival  of  the  first  reflection  at  the 
microphone  array.  Direct  target  cancellation  due  to  target  reflections  is  most  pro¬ 
nounced  when  there  are  a  small  number  of  reflections  and  those  reflections  arrive  at 
the  array  within  the  time  window  determined  by  the  primary  delay.  Direct  target 
cancellation  is  less  severe  when  there  are  a  large  number  of  reflections  and  only  a 
small  fraction  of  the  reflections  arrive  within  the  time  window  determined  by  the 
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primary  delay,  and  other  benefits  derived  from  using  nonzero  values  o£  the  primary 
delay  suggest  that  it  is  advisable  to  use  a  primary  delay  that  is  nonzero,  but  relatively 
small. 

An  attempt  was  made  to  quantify  the  robustness  of  the  system  when  a  small 
fraction  of  the  total  number  of  reflections  occur  within  D  samples  of  the  direct  wave. 
However,  when  more  than  one  reflection  exists,  the  relationship  between  system  per¬ 
formance  and  primary  delay  depends  on  the  structure  of  the  inverse  of  the  autocor¬ 
relation  matrix,  and  the  system  performance  cannot  be  explained  solely  on  the  basis 
of  the  reflections  arriving  within  D  samples  of  the  direct  wave. 
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Chapter  7 


Simulation  Results 


7.1  Introduction 

The  purpose  of  this  chapter  is  to  evaluate  adaptive- array  hearing  aids  in  a  variety  of 
acoustic  environments.  The  systems  considered  axe  based  on  the  generalized  sidelobe 
canceller  proposed  by  Griffiths  and  Jim,  described  in  Sec.  2.1.  These  systems  are 
implemented  in  computer  simulations  and  evaluated  using  Gj,  the  physical  measure 
of  intelligibility-weighted  gain  described  in  Sec.  3.3. 

The  results  of  these  simulations  will  provide  answers  to  the  following  questions: 

1.  Are  the  modifications  suggested  in  Chs.  4  and  5  effective  against  the  problems 
of  misadjustment  and  misalignment  when  integrated  into  a  complete  system? 

2.  What  level  of  performance  can  be  expected  with  practical  systems  in  a  variety 
of  acoustic  environments? 

3.  How  is  performance  in  various  environments  affected  by  design  parameters  such 
as  filter  length  and  number  of  microphones? 


7.2  Processing 

The  first  step  required  for  the  computer  simulations  is  to  generate  signals  received 
by  the  microphone  arrays  for  processing  by  the  systems  under  consideration.  This 


141 


was  accomplished  by  convolving  the  source  materials  described  in  Sec.  3.1  (IEEE 
sentences  and  SPIN  babble  sampled  at  10  kHz)  with  the  source-to-microphone  impulse 
responses  described  in  Sec.  3.2.  To  simulate  perfectly  aligned  arrays,  the  target  source 
was  located  at  0°  and  the  jammer  source  was  at  45°  aaimuth.^  To  simulate  misaligned 
arrays,  the  target  source  was  located  at  10°  and  the  jammer  source  wa^  at  55°  a.zimuth. 
Two  configurations  were  simulated,  a  7-cm  array  of  two  omnidirectional  microphones 
and  a  16-cm  array  of  five  omnidirectional  microphones,  as  described  in  Sec.  3.2.  Three 
levels  of  wall  absorption  were  used  to  generate  one  anechoic  and  two  reverberant 
conditions.  The  two  reverberant  conditions  had  direct-to-reverberant  ratios  of  +6  dB 
and  —2  dB,  and  will  be  referred  to  as  moderate  and  strong  reverberation,  respectively. 
The  relative  level  of  the  target  and  jammer  sources  was  varied  to  produce  three  target- 
to-jammer  ratios  of  —20,  0,  and  +20  dB. 

These  signals  were  processed  by  two-  and  five-microphone  adaptive  systems  using 
yoked  processors  (Sec.  3.3)  to  determine  the  effect  of  the  system  on  each  of  three  signal 
components:  the  jammer,  the  direct  target,  and  the  target  reflections.  The  algorithms 
evaluated  included  four  combinations  of  processing  options  based  on  modifications 
suggested  in  Chs.  4  and  5,  so  that  each  condition  was  tested  with  the  sum  method 
of  normalizing  the  step-size  parameter,  with  the  correlation  method  of  controlling 
adaptation,  with  both  of  these  modifications,  and  with  neither  modification.  Based 
on  the  results  of  Ch.  6,  the  primary  delay  was  set  to  D  =  50.  Two  adaptive  filter 
lengths  were  considered,  L  =  100  and  L  =  1000,  corresponding  to  10  ms  and  100  ms 
at  the  10  kHz  sampling  rate. 

Figure  7.1  shows  a  block  diagram  of  the  complete  system  with  both  modifica¬ 
tions.  The  sum  method,  described  and  analyzed  in  Ch.  4,  consists  of  normalizing  the 
adaptive  weights  according  to  (4.17)  and  (4.60).  For  comparison,  the  processor  was 
also  tested  using  the  traditional  method  (4.19).  For  both  methods,  the  dimensionless 
step-size  parameter  was  a  =  0.25.  The  signal  powers  required  by  (4.19)  and  (4.60) 
were  obtained  by  squaring  the  reference  input  and  system  output  and  then  processing 

^Previous  work  has  shown  that  the  performance  obtained  with  a  jammer  located  at  45°  is  repre¬ 
sentative  of  performance  obtained  with  a  single  jammer  at  other  angles  (Greenberg,  1989). 
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with  first-order  recursive  lowpass  filters.  The  time  constant  of  the  lowpass  filter  used 
on  the  reference  input  equaled  the  length  of  the  adaptive  filter  (10  ms  or  100  ms) 
in  order  to  account  for  the  power  of  the  signal  in  the  tapped  delay  fine.  The  time 
constant  of  the  lowpass  filter  used  on  the  system  output  was  10  ms,  which  was  found 
to  be  a  good  value  for  tracking  power  fluctuations  in  speech. 

The  correlation  method  of  controlling  adaptation  was  implemented  as  described 
in  Sec.  5.3.  For  the  two-microphone  array,  the  bandpass  filter  cutoff  frequencies  were 
1643  Hz  and  3286  Hz,  one  octave  about  the  center  frequency,  2464  Hz,  determined 
by  (5.30).  For  the  five- microphone  array,  the  bandpass  filter  cutoff  frequencies  were 
those  given  in  Table  5.1.  All  of  the  bandpass  filters  were  10*^-order  Butterworths. 
The  hard  limiter  was  included  in  computing  the  instantaneous  correlation,  and  the 
instantaneous  values  were  smoothed  by  a  first-order  recursive  lowpass  filter  with  10-ms 
time  constant.  The  threshold  correlation  value  was  po  =  0.  For  the  five-microphone 
system,  the  voting  method  was  used. 

For  each  combination  of  acoustic  condition  and  processing  option,  the  system 
adapted  on  the  same  sequence  of  40  IEEE  sentences  paired  with  40  matching-length 
segments  of  SPIN  babble.  The  lengths  of  the  sentences  varied  from  22656  to  38912 
samples,  with  a  mean  of  29242  samples.  When  the  source  signal  levels  were  adjusted  to 
produce  a  TJR  of  0  dB  over  all  40  sentences,  the  TJRs  for  individual  sentences  ranged 
from  -2.5  to  1.9  dB.  For  processing,  the  sentences  and  babble  segments  were  concate¬ 
nated  to  produce  a  single  set  of  source  signals  of  117  sec  duration.  For  evaluation, 
the  performance  measures  were  calculated  individually  for  each  sentence-plus-babble 
segment.  The  inteUigibility-weighted  gain,  Gj,  was  computed  based  on  the  system 
output  and  the  input  signal  received  at  one  microphone;  the  two-element  array  used 
the  leftmost  microphone  to  obtain  the  input  levels  and  the  five-element  array  used 
the  center  microphone. 

For  each  condition,  the  steady-state  performance  was  determined  by  averaging  the 
performance  measures  (in  dB)  obtained  from  the  last  five  sentences  (36-40).^  Com- 

^These  five  sentences  had  lengths  of  26880,  27648,  25856,  24448  and  31616  samples  and  TJRs  of 
1.4,  1.9,  -0.6,  -0.5,  and  0.9  dB  when  scaled  for  TJR=0  dB  over  all  40  sentences. 
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Figure  7-1:  Block  diagram  of  the  generalized  si  delobe  canceller  modified  by  the  cor¬ 
relation  method  of  controlling  adaptation  (details  shown  in  Fig.  5.8).  The  adaptive 
filter  weights  are  updated  according  to  (4.17)  using  either  the  traditional  method 
(4.19)  or  the  sum  method  (4.60)  of  normalizing  the  step-size  parameter.  The  system 
has  M  microphones  and  M  —  1  adaptive  filters,  each  with  L  taps. 
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paring  these  performance  measures  to  values  obtained  by  averaging  the  preceeding 
five  sentences  (31-35)  showed  a  difference  of  less  than  2  dB  for  all  conditions.  There¬ 
fore,  the  results  presented  below  are  not  sensitive  to  minor  variations  in  the  speech 
materials. 

For  most  of  the  conditions  considered,  the  system  converged  rapidly,  and  the  per¬ 
formance  was  roughly  constant  over  all  40  sentences.  However,  a  few  of  the  conditions 
considered  had  much  longer  convergence  times,  and  for  one  condition  (five-microphone 
array  with  1000-point  filters),  the  system  had  not  reached  steady-state  even  after  40 
sentences.  The  steady-state  performance  measures  based  on  the  last  five  sentences 
are  presented  in  the  next  section,  while  the  transient  performance  will  be  discussed 
in  Sec.  7.4. 


7.3  Steady-state  performance 

7.3.1  Effect  of  modifications 

Figures  7.2  and  7.3  show  the  steady-state  performance  with  and  without  the  modifi¬ 
cations.  Each  plot  in  these  figures  shows  the  inteDigibUity-weighted  gain,  Gi,  versus 
the  direct-to-reverberant  ratio  for  aU  four  processing  options  (sum  method  of  nor¬ 
malizing  the  step-size  parameter,  correlation  method  of  controlling  adaptation,  both 
modifications,  and  neither  modification).  Figure  7.2  shows  the  results  for  0  dB  input 
TJR,  and  Fig.  7.3  shows  the  results  for  -f-20  dB  input  TJR.  Results  are  not  shown  for 
input  TJR  of  —20  dB  because  the  modifications  have  little  or  no  effect  at  low  TJR. 

As  explained  in  Sec.  3.3,  positive  Gj  values  indicate  improved  intelligibility  (ampli¬ 
fication  of  the  target  or  attenuation  of  the  jammer),  while  negative  Gi  values  indicate 
degraded  intelligibility  (attenuation  of  the  target  or  amphfication  of  the  jammer).  Fig¬ 
ure  7.2  shows  that  for  0  dB  input  TJR,  the  unmodified  algorithm  provides  moderate 
gains  for  all  conditions.  Performance  improves  with  the  addition  of  either  modifi¬ 
cation,  and  adding  both  modifications  provides  the  best  performance.  The  largest 
improvements  occur  in  the  anechoic  environment,  but  the  modifications  also  provide 
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Figure  7-2:  Intelligibility-weighted  gain,  C?i,  versus  direct-to-reverberant  ratio  show¬ 
ing  steady-state  performance  with  and  without  the  two  modifications.  Anechoic  re¬ 
sults  are  shown  at  a  direct-to-reverberant  ratio  of  -1-20  dB.  The  input  TJR  was  0  dB. 
The  arrays  were  either  perfectly  aligned  to  the  straight-ahead  target  or  misaligned 
by  10°.  The  systems  tested  were  two-  eind  five-microphone  arrays  with  100-  and 
1000-point  adaptive  filters. 
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Figure  7-3:  Intelligibility- weighted  gain,  Gj,  versus  direct-to-reverberant  ratio  show¬ 
ing  steady-state  performance  with  and  without  the  two  modifications.  Anechoic  re¬ 
sults  are  shown  at  a  direct-to-reverberant  ratio  of  -+-20  dB.  The  input  TJR  was  -f20 
dB.  The  arrays  were  either  perfectly  aligned  to  the  straight-ahead  target  or  mis¬ 
aligned  by  10°.  The  systems  tested  were  two-  and  five-microphone  arrays  with  100- 
and  1000-point  adaptive  filters. 
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substantial  improvements  in  moderate  reverberation. 

Figure  7.3  demonstrates  the  need  for  the  modifications  when  the  input  TJR  is 
high.  In  this  case,  the  unmodified  algorithm  performs  poorly,  with  values  of  Gi  as 
low  as  —20  dB.  Again,  the  best  performance  is  obtained  with  both  modifications,  the 
largest  improvements  occur  in  the  anechoic  environment,  and  the  modifications  also 
provide  substantial  improvements  in  moderate  reverberation.  Although  performance 
generally  improves  with  either  modification,  for  the  misaligned  conditions  the  corre¬ 
lation  method  alone  provides  much  larger  benefits  than  the  sum  method  alone.  While 
the  sum  method  alone  effectively  reduces  misadjustment  when  the  array  is  aligned,  it 
permits  substantial  target  cancellation  when  the  array  is  misaligned.  The  interactive 
effects  of  the  two  modifications  on  misadjustment  and  target  cancellation  have  been 
explained  previously  (Greenberg  and  Zurek,  1992). 

From  the  results  shown  in  Figs.  7.2  and  7.3,  it  is  clear  that  the  system  always 
performs  better  with  both  modifications  than  it  does  with  either  modification  alone  or 
with  no  modifications.  This  was  shown  previously  for  two  microphones  with  L  =  100 
in  an  anechoic  environment  (Greenberg  and  Zurek,  1992).  The  current  results  indicate 
that  the  modifications  are  effective  in  anechoic  and  reverberant  environments,  for 
arbitrary  filter  lengths,  and  for  five  as  well  as  two  microphones. 

The  next  section  summarizes  steady-state  performance  results  for  the  algorithm 
utilizing  both  modifications.  The  transient  performance  of  all  four  processing  options 
will  be  considered  in  Sec.  7.4. 

7.3.2  Performance  with  both  modifications 

This  section  considers  in  more  detail  the  performance  of  the  algorithm  with  both 
modifications;  it  examines  the  effect  of  design  parameters  (adaptive  filter  length,  L, 
and  number  of  microphones,  M),  as  well  as  variations  in  the  acoustic  environment 
(degree  of  reverberation,  TJR,  array  alignment).  In  presenting  these  results,  it  will 
be  useful  first  to  consider  the  intelligibility- weighted  gain,  Gi,  and  then  to  examine 
the  components  that  compose  Gi. 
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Intelligibility-weighted  gain 

Figure  7.4  shows  the  steady-state  performance  for  systems  using  both  modifications. 
As  before,  the  steady-state  performance  was  determined  from  the  average  value  of 
Gi  for  the  last  five  sentences.  The  values  shown  for  TJR=0  dB  and  TJR=20  dB  are 
repeated  from  Figs.  7.2  and  7.3. 

Each  plot  in  Fig.  7.4  shows  Gi  versus  the  direct-to- reverberant  ratio  for  three  input 
TJRs.  Comparing  the  three  curves  in  each  plot  shows  that  the  best  performance  is 
obtained  when  the  input  TJR  is  low,  but  very  good  performance  is  obtained  for 
all  input  TJRs.  For  all  values  of  input  TJR,  Gi  is  positive,  indicating  that  the 
systems  always  provide  some  benefit.  Furthermore,  comparing  the  right  and  left 
sides  of  Fig.  7.4  reveals  that  the  systems  are  robust  to  misalignment.  Comparison 
with  Figs.  7.2  and  7.3  indicates  that  the  combination  of  the  two  modifications  are 
responsible  for  this  robustness  to  misalignment  and  high  input  TJR. 

The  results  in  Fig.  7.4  show  that  the  best  performance  is  obtained  in  anechoic 
environments  and  performance  decreases  with  increasing  reverberation.  In  moderate 
reverberation,  the  longer  filters  provide  substantial  benefits  over  the  shorter  filters. 
In  strong  reverberation,  G/  approaches  a  small,  positive  value  regardless  of  filter 
length.  These  results  are  consistent  -with  previously  reported  trends  of  performance 
for  a  two-microphone  system  in  reverberation  for  the  limited  ca^e  of  no  target  signal 
(Greenberg  and  Zurek,  1992). 

For  the  conditions  studied  here,  the  five-microphone  array  shows  a  slight  perfor¬ 
mance  advantage  over  the  two-microphone  array.  However,  these  two  arrays  perform 
comparably  because  there  is  only  one  directional  jammer.  The  two-microphone  array 
can  form  one  independent  broadband  null,  while  the  five-microphone  array  can  cre¬ 
ate  four  independent  broadband  nulls.  Therefore,  the  five-microphone  array  will  have 
a  substantial  advantage  over  the  two-microphone  array  in  the  presence  of  multiple 
directional  jammers.  The  number  of  microphones  required  in  a  practical  system  will 
be  discussed  in  Sec.  8.2. 

The  points  marked  by  ‘x’  in  Fig.  7.4  indicate  the  performance  of  a  fixed  beam- 
former  with  uniform  weights,  that  is,  the  system  shown  in  Fig.  7.1  with  the  adap- 
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Figure  7-4:  Intelligibility- weighted  gain,  Gi,  versus  direct-to-reverberant  ratio  show¬ 
ing  steady-state  performance  with  both  modifications  for  three  input  TJRs.  Anechoic 
results  are  shown  at  a  direct-to-reverberant  ratio  of  -t-20  dB.  The  arrays  were  either 
perfectly  aligned  to  the  straight-ahead  target  or  misaligned  by  10°.  The  systems  tested 
were  two-  and  five-microphone  arrays  with  100-  and  1000-point  adaptive  filters.  Also 
shown  axe  results  of  the  underljdng  fixed  system,  described  in  the  text. 
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tive  filter  disconnected.  In  extreme  reverberation,  the  performance  of  the  system 
approaches  that  of  the  underlying  fixed  system,  that  is,  the  output  is  simply  the  pri¬ 
mary  signal  because  the  adaptive  filter  weights  approach  zero.  This  was  demonstrated 
previously  for  a  two-microphone  system  with  no  target  signal  (Greenberg  and  Zurek, 
1992).  The  most  reverberant  condition  simulated  here  (—2  dB  direct-to-reverberant 
ratio)  has  sufficient  directional  components  so  that  the  adaptive  filter  weights  are 
nonzero  and  the  performajice  may  exceed  that  of  the  underl3ring  fixed  beamformer. 
Even  so,  the  current  results  demonstrate  that  with  both  modifications,  this  trend 
towajd  ‘graceful  failure’  of  the  adaptive  algorithm  is  extended  to  cases  including  both 
target  and  jammer  signals,  as  expected. 

Components  of  intelligibility-weighted  gain 

Considering  the  inteUigibifity-weighted  measures  that  contribute  to  Gj,  discussed  in 
Sec.  3.3,  provides  additional  understanding  of  the  behavior  of  the  system.  Figures 
7. 5-7.8  display  values  of  Ar(Td),  Ar(T,),  Ar(r),  and  Ar(  J)  derived  from  the  same 
output  signals  used  to  produce  the  values  of  G/  shown  in  Fig.  7.4. 

Under  ideal  conditions,  the  generalized  sidelobe  canceller  shordd  exactly  preserve 
the  direct  target  signal,  that  is  Ar(r(i)  shotdd  equal  0  dB.  Under  nonideal  condi¬ 
tions,  misalignment  and  target  reflections  are  the  two  possible  causes  of  direct  target 
cancellation.  In  accordance  with  the  results  of  Ch.  6,  all  of  the  systems  evaluated 
used  a  relatively  short  primary  delay  of  D  =  50  (5  ms).  As  discussed  in  Sec.  6.4,  for 
these  source-to-microphone  impulse  responses,  the  first  off-axis  reflection  arrives  at 
the  microphones  66  samples  after  the  direct  wave.  Therefore,  for  aligned  arrays,  the 
simulation  results  should  show  no  cancellation  of  direct  target  due  to  target  reflec¬ 
tions. 

Figure  7.5  shows  the  effect  of  the  systems  on  the  direct  target.  As  expected,  for 
the  two-microphone  array,  the  direct  target  signal  was  completely  preserved.  The 
five-microphone  system  showed  slight  cancellation  of  the  direct  target  (—1.2  dB  < 
Ar(Td)  <  0  dB)  due  to  level  and  phase  differences  at  pairs  of  microphones  placed 
asymmetrically  with  respect  to  the  direction  of  propagation.  This  can  be  considered 
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a  form  of  misalignment  due  to  the  fact  that  the  room  simulation  produces  spherical 
radiation,  so  the  target  signal  at  the  array  was  not  an  ideal  plane  wave. 

For  misaligned  arrays,  target  cancellation  is  possible  because  the  system  is  con¬ 
strained  to  preserve  sources  arriving  from  0°,  but  the  target  source  actually  arrives 
from  10°.  Despite  this  violation  of  the  assumption  of  known  target  direction,  with 
both  modifications  the  two-microphone  arrays  exhibit  very  little  direct  target  cancel¬ 
lation  (less  than  2  dB).  However,  the  five-microphone  array  shows  significant  cancel¬ 
lation  of  the  direct  target,  particularly  in  an  anechoic  environment.  This  is  because 
the  five-microphone  system  can  steer  four  independent  broadband  nulls.  Even  though 
the  modifications  prevent  adaptation  when  the  target  signal  is  strong,  the  system  can 
steer  multiple  nulls  and  therefore  directs  one  at  the  jammer  and  another  at  the  mis¬ 
aligned  target.  Without  the  modifications,  this  additional  null  would  be  much  deeper. 
This  illustrates  one  of  the  major  differences  between  the  two-  and  five-imcrophone 
arrays;  it  will  be  discussed  more  thoroughly  in  Sec.  8.2. 

Two  additional  features  of  the  results  in  Fig.  7.5  for  the  five-microphone  array 
deserve  mention.  First,  the  most  extreme  direct  target  cancellation  occurs  at  0  dB 
input  TJR.  This  effect  was  reported  previously  (Greenberg  and  Zurek,  1992)  and 
occurs  because  the  correlation  method  is  more  accurate  and  therefore  more  effective 
against  misalignment  at  higher  input  TJR.  Second,  it  appears  that  the  shorter  filter 
results  in  more  target  cancellation  than  the  longer  filter.  This  misleading  result  is  due 
to  the  fact  that  even  after  40  sentences,  the  five-microphone  array  with  1000-point 
filters  has  not  converged  completely.  This  long  convergence  time  will  be  discussed 
in  Sec.  7.4.  For  now,  it  is  sufficient  to  note  that  performance  obtained  by  such  long 
convergence  times  is  irrelevant  in  a  practical  device. 

Figure  7.6  shows  the  effect  of  the  systems  on  target  reflections  for  the  two  rever¬ 
berant  conditions.  The  values  of  Ar(T,.)  shown  in  Fig.  7.6  range  from  -4  dB  to  -f  1 
dB.  Positive  values  of  Ar(T,)  only  occur  for  — 20  dB  input  TJR,  in  which  case  the 
slight  amplification  of  target  reflections  is  a  side  effect  of  weights  which  have  adapted 
to  cancel  the  dominant  jammer  signal.  Negative  values  of  Ar(Tr)  indicate  some  can¬ 
cellation  of  target  reflections.  In  general,  target  reflections  are  subject  to  increased 
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Figure  7-5:  Effect  of  systems  on  direct  target,  Ar(r£i),  versus  direct-to-reverberant 
ratio  with  both  modifications  for  three  input  TJRs.  Anechoic  results  are  shown  at 
a  direct-to-reverberant  pratio  of  -t-20  dB.  The  arrays  were  either  perfectly  aligned  to 
the  straight-ahead  target  or  misaligned  by  10°.  The  systems  tested  were  two-  and 
five-microphone  arrays  with  100-  and  1000-point  adaptive  filters. 
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cancellation  at  high.  TJR,  at  lower  direct-to-reverberant  ratio,  with  longer  filters, 
and  with  more  microphones.  These  correspond  to  situations  when  the  reflections  are 
stronger  and  when  the  system  has  the  capabihty  to  cancel  them. 

Figure  7.7  shows  the  effect  of  the  systems  on  the  total  taxget  signal,  and  Fig.  7.8 
shows  the  effect  of  the  systems  on  the  jammer  signal.  Comparing  the  relative  magni¬ 
tudes  of  the  values  shown  in  Figs.  7.7  and  7.8  indicates  that  Ar(  J)  is  the  donainant 
component  of  the  values  of  Gj  shown  in  Fig.  7.4.  The  only  exception  to  this  is  the 
misaligned,  five-microphone  array  in  an  anechoic  environment.  In  this  case,  the  sys¬ 
tem  produces  significant  cancellation  of  the  direct  target,  as  discussed  above.  This 
target  cancellation  is  undesirable,  but  for  these  conditions  the  system  provides  even 
more  cancellation  of  the  jammer  signal.  For  example,  with  L  =  100  and  the  input 
TJR  =  0  dB,  Ar(r)  =  —19  dB  and  Ar(J)  =  31  dB,  producing  Gj  =  12  dB.  A  real 
system  wiU  require  automatic  gain  control  to  maintain  output  levels,  so  such  tcirget 
cancellation  is  tolerable  as  long  as  jammer  cancellation  exceeds  target  cancellation. 
The  disadvantage  is  that  requiring  additional  gain  to  restore  the  output  levels  also 
amphfies  any  uncorrelated  noise,  including  microphone  noise.  It  is  also  important 
to  note  that  this  target  cancellation  only  occurred  for  the  anechoic  condition,  and  is 
drastically  reduced  in  moderate  reverberation.  It  is  anticipated  that  this  will  not  be 
a  problem  with  real  systems  operating  in  real  environments. 

Polar  patterns 

Additional  insight  into  the  behavior  of  adaptive  arrays  is  obtained  by  considering 
the  magnitude  response  of  the  arrays  as  a  function  of  frequency  and  source  angle  for 
distant  plane- wave  sources.  For  broadside  arrays  of  omnidirectional  microphones  in 
free  space,  these  responses  are  cyhndrically  symmetric  about  the  array  axis  (90°  - 
270°)  and  therefore  completely  specified  by  their  response  in  the  horizontal  plane. 
The  magnitude  response  of  the  array  is  generated  by  computing  the  response  of 
the  generalized  sidelobe  canceller  to  pure  tones  travehng  as  ideal  plane  waves  with 
wavefronts  orthogonal  to  the  horizontal  plane  for  each  azimuthal  angle.  This  results 
in  a  response  that  is  a  function  of  both  angle  and  frequency.  The  broadband  response 
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Figure  7-6:  Effect  of  systems  on  reflected  target,  AT{Tr),  versus  direct-to-reverberant 
ratio  with  both  modifications  for  three  input  TJRs.  Anechoic  results  are  shown  at 
a  direct-to-reverberant  ratio  of  -t-20  dB.  The  arrays  were  either  perfectly  aligned  to 
the  straight-ahead  target  or  misaligned  by  10°.  The  systems  tested  were  two-  and 
five-microphone  arrays  with  100-  and  1000-point  adaptive  filters. 
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Figure  7-7:  Effect  of  systems  on  total  target,  Ar(r),  versus  direct-to-reverberant 
ratio  with  both  modifications  for  three  input  TJRs.  Anechoic  resTilts  axe  shown  at 
a  direct-to-reverberant  ratio  of  -t-20  dB.  The  arrays  were  either  perfectly  aligned  to 
the  straight-ahead  target  or  misaligned  by  10°.  The  systems  tested  were  two-  and 
five-microphone  arrays  with  100-  and  1000-point  adaptive  filters. 
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Figure  7-8:  Effect  of  systems  on  total  jammer,  AT{J),  versus  direct-to-reverberant 
ratio  with  both  modifications  for  three  input  TJRs.  Anechoic  results  are  shown  at 
a  direct-to-reverberant  ratio  of  -1-20  dB.  The  arrays  were  either  perfectly  aligned  to 
the  straight-ahead  target  or  misaligned  by  10°.  The  systems  tested  were  two-  and 
five-microphone  arrays  with  100-  and  1000-point  adaptive  filters. 


is  produced  using  intelligibility- weighted  averaging,  that  is,  taking  the  magnitude 
squared  at  the  center  frequency  of  each  one- third-octave  band,  computing  the  level 
in  dB,  and  applying  Articulation  Index  weights  to  combine  bands,  as  in  (3.1). 

Figures  7.9-7.14  show  the  intelhgibihty- weighted  polar  patterns^  for  the  24  con¬ 
ditions  (two  filter  lengths,  two  array  sizes,  two  array  orientations,  three  levels  of 
reverberation)  with  0  dB  input  TJR  shown  in  Fig.  7.4.  In  addition,  the  correspond¬ 
ing  value  of  Gt  from  Fig.  7.4  is  shown  below  each  plot.  Each  polar  pattern  was 
computed  from  a  single  snapshot  of  the  adaptive  weights  obtained  after  the  system 
adapted  on  the  entire  sequence  of  40  sentences  and  babble. 

Figure  7.9  shows  responses  of  the  two-microphone  array  in  an  anechoic  environ¬ 
ment.  The  system  has  unity  gain  to  signals  arriving  from  0°,  as  constrained  by  the 
generalized  sidelobe  canceller  structure,  and  has  formed  a  null  in  the  direction  of  the 
jammer  signal  (45°  for  the  aligned  array  and  55°  for  the  misaligned  array).  Because 
the  two-microphone  system  has  one  broadband  degree  of  freedom,  it  can  create  one 
independent  null.  The  second  null  (at  135°  or  125°)  is  a  result  of  the  cylindrical 
symmetry  of  the  broadside  array.  The  polar  pattern  indicates  that  sources  arriving 
from  angles  between  180°  and  360°  are  slightly  amplified  by  the  system.  However, 
these  adaptive  weights  were  obtained  with  no  signals  arriving  at  the  array  from  those 
directions,  so  there  was  no  reason  for  the  system  to  prevent  amplification  of  signals 
arriving  from  those  directions. 

Figure  7.10  shows  polar  patterns  for  the  five-microphone  array  in  an  anechoic  en¬ 
vironment.  For  the  most  part,  these  patterns  are  similar  to  those  seen  in  Fig.  7.9. 
One  notable  difference  occurs  in  the  misaligned  case,  where  the  five-microphone  sys¬ 
tem  steers  a  second  independent  null  in  the  direction  of  the  misaligned  target  (10°). 
This  second  null  was  discussed  previously  as  the  cause  of  the  target  cancellation  seen 
in  Fig.  7.5. 

In  Figs.  7.9  and  7.10,  the  depth  of  the  nulls  in  the  jammer  direction  is  not  exactly 
equal  to  the  corresponding  values  of  Ar(J)  shown  in  Fig.  7.8.  This  is  because  the 

^According  to  the  convention  for  polar  plots,  positive  angles  progress  counterclockwise  from  zero 
degrees.  These  plots  are  consistent  with  the  definition  of  positive  angles  given  in  Sec.  3.2  if  they  are 
interpreted  as  being  viewed  from  below  the  array. 
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Figure  7-9:  Polar  patterns  showing  the  intelhgibility-weighted  response  of  the 
two-microphone  system  in  an  anechoic  environment.  The  input  TJR  was  0  dB.  Radial 
grid  lines  are  in  10  dB  increments,  and  the  dashed  line  indicates  0  dB.  Values  of  Gj 
are  from  corresponding  conditions  in  Fig.  7.4. 
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Figure  7-10:  Polar  patterns  showing  the  intelligibility-weighted  response  of  the 
five-microphone  system  in  an  anechoic  environment.  The  input  TJR  was  0  dB.  Radial 
grid  hues  are  in  10  dB  increments,  and  the  dashed  fine  indicates  0  dB.  Values  of  Gj 
axe  from  corresponding  conditions  in  Fig.  7.4. 
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polar  pattern  is  based  on  a  single  snapshot  of  the  adaptive  weights,  while  AT{J)  is 
determined  from  the  spectra  of  output  signals  obtained  while  the  adaptive  weights 
were  fluctuating. 

Figures  7.11-7.14  show  polar  patterns  for  the  two-  and  five-microphone  axrays  in 
two  reverberant  environments.  In  interpreting  these  polar  patterns,  it  is  important 
to  realize  that  the  responses  shown  are  for  independent  sources  arriving  from  each 
angle,  but  that  while  adapting  in  reverberation,  the  weights  are  adjusted  to  minimize 
total  output  power,  which  may  be  accompHshed  by  coherent  addition  of  correlated 
sources  (reflections)  arriving  from  multiple  angles.  Comparing  the  two  filter  lengths 
in  Figs.  7.11-7.14  reveals  that  shorter  filters  have  deeper  nuUs,  but  the  results  in 
Fig.  7.4  indicate  that  the  longer  filter  axe  associated  with  larger  values  of  Gi.  This 
is  because  the  short  filter  minimizes  the  total  output  power  by  forming  a  null  in  the 
direction  of  the  direct  jammer,  while  the  longer  filter  obtains  additional  benefits  by 
using  the  adaptive  weights  to  add  coherent  signals  (arriving  from  different  directions) 
out  of  phase,  thereby  cancelling  the  direct  jammer  as  well  as  some  jammer  reflections. 

This  is  verified  in  Fig.  7.15,  which  shows  impulse  responses  for  the  two- microphone, 
aligned  array  in  moderate  reverberation.  The  top  panel  shows  the  jammer-source- to- 
microphone  impulse  response  for  the  left  microphone.  The  amplitude  of  the  direct 
wave  is  0.04;  it  was  clipped  in  the  figure  to  show  the  reflections  in  more  detail.  The 
middle  and  bottom  panels  show  the  source-to-system-output  impulse  responses  for 
100-  and  1000-point  filters,  respectively,  computed  with  the  same  snapshot  of  the 
adaptive  weights  used  to  generate  the  left  half  of  Fig.  7.11.  These  impulse  responses 
indicate  roughly  equal  cancellation  of  the  direct  wave  for  the  two  filter  lengths,  and 
superior  cancellation  of  reflections  for  the  longer  filter. 

In  a  hearing  aid,  motion  of  the  hstener’s  head  will  cause  the  array  position  to  vary 
with  respect  to  the  sound  sources.  The  width  of  the  nulls  shown  in  Figs.  7.9-7.14 
indicate  that  cancellation  of  the  direct  wave  will  be  robust  to  slight  head  movements. 
However,  as  discussed  previously,  the  longer  filters  provide  improved  cancellation  in 
reverberation  by  adding  coherent  reflections  out  of  phase.  Cancellation  of  this  type 
may  be  adversely  affected  by  even  slight  head  movements.  Future  evaluations  with  a 
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G_l  =  21.7dB  GJ  =  18.9dB 

Figure  7-11:  Polar  patterns  showing  the  inteUigibihty-weighted  response  of  the 
two-microphone  system  in  moderate  reverberation.  The  input  TJR  was  0  dB.  Radial 
grid  Unes  are  in  10  dB  increments,  and  the  dashed  Une  indicates  0  dB.  Values  of  Gi 
are  from  corresponding  conditions  in  Fig.  7.4. 
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GJ  =  27.6dB  GJ  =  21.4dB 

Figure  7-12:  Polar  patterns  showing  the  intelligibility-weighted  response  of  the 
five-naicrophone  system  in  moderate  reverberation.  The  input  TJR  was  0  dB.  Radial 
grid  lines  are  in  10  dB  increments,  and  the  dashed  line  indicates  0  dB.  Values  of  Gi 
are  from  corresponding  conditions  in  Fig.  7.4. 
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G_l  =  6.4  dB  GJ  =  7.3  dB 

Figure  7-13:  Polar  patterns  showing  the  intelligibility-weighted  response  of  the 
two-microphone  system  in  strong  reverberation.  The  input  TJR  was  0  dB.  Radial 
grid  lines  are  in  10  dB  increments,  and  the  dashed  hne  indicates  0  dB.  Values  of  Gj 
are  from  corresponding  conditions  in  Fig.  7.4. 
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GJ  =  11.8dB  GJ  =  12.2dB 

Figure  7-14:  Polar  patterns  showing  the  intelligibility- weighted  response  of  the 
five-microphone  system  in  strong  reverberation.  The  input  TJR  was  0  dB.  Radial 
grid  lines  are  in  10  dB  increments,  and  the  dashed  line  indicates  0  dB.  Values  of  Gi 
axe  from  corresponding  conditions  in  Fig.  7.4. 
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aligned,  M=2,  moderate  reverberation 


Figure  7-15:  ImpiJse  responses  for  two-microphone,  aligned  axray  in  moderate  rever¬ 
beration.  The  top  panel  shows  one  jammer-source-to-microphone  impulse  response, 
with  direct  wave  am.plitude  of  0.04  (cUpped).  The  middle  and  bottom  pajiels  show 
the  jammer-source-to-system  output  impulse  responses  for  100-  and  1000-point  filters, 
respectively. 
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real  system  should  study  the  effects  of  head  movements  in  order  to  assess  the  practical 
importance  of  the  benefits  of  long  filters  in  reverberation  demonstrated  here. 

Figure  7.16  shows  polax  patterns  for  the  responses  of  the  underlying  fixed  sys¬ 
tems.  Included  for  comparison  is  a  two-microphone  array  with  the  same  span  as  the 
five-microphone  array.  These  patterns  are  constant  for  all  configurations  of  inputs. 
The  polar  plots  show  that  the  fixed  systems  are  robust  to  misalignment  (the  main 
beam  is  fairly  broad  around  0°)  and  that  in  the  presence  of  directional  jammers,  the 
two-microphone  arrays  provide  gains  of  up  to  5  dB,  while  the  five-microphone  array 
provides  gains  of  up  to  10  dB,  where  the  actual  attenuation  depends  on  angle  of 
arrival. 

In  extreme  reverberation,  the  performance  of  the  fixed  system  is  characterized 
by  its  gain  against  isotropic  noise,  which  is  equivalent  to  its  directivity  index.  The 
intelligibility-weighted  directivity  index  (3.11)  is  Dj  =  1.7  dB  for  the  7-cm  two- 
microphone  axray,  Dj  =  2.5  dB  for  the  16-cm  two-microphone  array,  and  D/  =  3.0 
dB  for  the  16-cm  five-microphone  array.  It  was  found  that  two-  and  five-microphone 
arrays  of  the  same  length  have  similar  values  of  Dj  because  at  low  frequencies,  the 
two-microphone  fixed  array  with  uniform  weights  has  a  relatively  narrow  mainlobe, 
while  at  high  frequencies,  the  five-microphone  arrays  have  lower  sidelobes.  Both 
a  narrow  mainlobe  and  low  sidelobes  increase  the  directivity  index.  For  the  array 
dimensions  and  frequency  ranges  considered  here,  the  weighted  frequency  averaging 
results  in  comparable  values  of  Dj. 

The  performance  of  the  underlying  fixed  system  can  be  improved  by  using  direc¬ 
tional  microphones  (Soede  et  al.,  1993a;  Stadler  and  Rabinowitz,  1993).  The  fixed 
performance  expected  with  directional  microphones  and  the  effect  of  directional  mi¬ 
crophones  on  the  adaptive  system  will  be  discussed  in  Sec.  8.3. 
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Figure  7-16:  Polar  patterns  showing  the  intelligibihty-weighted  response  of  the  un¬ 
derlying  fixed  systems  for  two-  and  five-microphone  arrays. 
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7.3.3  Summary  of  steady-state  performance 

The  steady-state  performance  of  the  systems  investigated  is  summarized  as  follows: 

•  Using  both  modifications  always  gives  better  steady-state  performance  than  us¬ 
ing  either  modification  alone  or  no  modifications.  Without  the  modifications, 
performance  decreases  dramatically  with  misalignment  or  increased  TJR.  With 
both  modifications,  the  system  is  robust  to  misalignment  and  high  TJR,  and 
Gi  is  always  positive. 

•  Using  a  short  primary  delay  prevents  cancellation  of  the  direct  target  due  to 
target  reflections. 

•  With  both  modifications,  the  intelligibility- weighted  gain,  Gj  decreases  from 
20-30  dB  in  an  anechoic  environment  to  3-10  dB  in  strong  reverberation. 

•  In  an  anechoic  environment,  performance  is  independent  of  filter  length.  In 
moderate  reverberation,  longer  filters  perform  much  better  than  shorter  filters. 

•  For  the  single  jammer  case  studied,  the  five-microphone  array  has  only  a  slight 
advantage  over  the  two-microphone  array. 


7.4  Transient  performance 

7.4.1  Effect  of  modifications 

This  section  considers  the  transient  performance  of  the  algorithm  with  and  without 
modifications.  Initially,  it  is  useful  to  summarize  the  following  factors  affecting  con¬ 
vergence  of  the  adaptive  filters,  many  of  which  are  derived  from  (4.70),  (4.74),  and 
(4.76). 

•  The  convergence  time  is  proportional  to  the  total  number  of  adaptive  filter  taps 
(the  product  of  the  filter  length  and  one  less  than  the  number  of  microphones). 
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•  Using  the  sum  method  of  normalizing  the  step-size  paxameter  slows  the  conver¬ 
gence.  At  low  input  TJR,  the  convergence  is  initially  slower  than  the  traditional 
method  by  a  factor  of  2,  and  at  high  input  TJR  the  convergence  is  slower  by  a 
factor  proportional  to  the  TJR. 

•  Using  the  correlation  method  of  controlling  adaptation  also  slows  convergence. 
Since  the  correlation  method  inhibits  update  of  the  adaptive  weights  on  some 
iterations,  the  convergence  time  is  longer  by  a  factor  equal  to  the  reciprocal  of 
the  percentage  of  cycles  for  which  the  system  actually  adapted. 

•  The  convergence  time  is  inversely  proportional  to  the  dimensionless  step-size 
parameter,  a,  which  was  held  constant  in  these  simulations. 

•  The  convergence  time  is  affected  by  the  spread  of  the  eigenvalues  of  the  auto¬ 
correlation  matrix  of  the  reference  signal. 

Section  7.3  considered  the  steady-state  performance  after  the  system  was  allowed 
to  adapt  on  a  sequence  of  40  sentences.  Figure  7.17  shows  the  intelligibility- weighted 
gain,  Gj,  as  a  function  of  sentence  number  for  all  of  the  misaligned  conditions  with 
input  TJR  of  0  dB,  and  Fig.  7.18  shows  results  for  the  same  conditions  with  input 
TJR  of  -|-20  dB.^  Results  are  not  shown  for  the  aligned  axray,  which  demonstrated 
siTnilar  behavior  except  for  predictable  exceptions  due  to  the  ability  of  the  system  to 
cancel  misaligned  targets. 

The  four  curves  on  each  plot  in  Figs.  7.17  and  7.18  indicate  the  transient  per¬ 
formance  of  the  four  processing  options  (sum  method  of  normalizing  the  step-size 
parameter,  correlation  method  of  controlling  adaptation,  both  modifications,  and 
neither  modification).  Because  the  effect  of  each  modification  individually  is  to  slow 
convergence,  it  is  expected  that  the  system  using  both  modifications  will  converge 
most  slowly.  This  is  confirmed  by  the  results  in  Figs.  7.17  and  7.18.  For  many  condi¬ 
tions,  the  systems  converged  sufficiently  rapidly  that  performance  is  constant  over  all 

^The  data  for  misaligned  conditions  shown  in  Figs.  7.2  and  7.3  correspond  to  the  mean  values  of 
Gi  for  the  last  five  sentences  of  the  data  shown  in  Figs.  7.17  and  7.18. 
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Figure  7-17:  Transient  performance  shown  by  intelligibility- weighted  gain,  G/,  versus 
sentence  number  with  and  without  the  two  modifications.  The  input  TJR  was  0  dB 
and  the  arrays  were  misaligned  by  10° .  Two-  and  five-microphone  arrays  were  tested 
with  100-  and  1000-point  adaptive  filters  in  three  environments  (anechoic,  moderate 
reverberation,  and  strong  reverberation). 
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Figure  7-18:  Transient  performance  shown  by  intelligibility-weighted  gain,  G/,  versus 
sentence  number  with  and  without  the  two  modifications.  The  input  TJR  was  -|-20  dB 
and  the  arrays  were  misaligned  by  10°.  Two-  and  five-microphone  arrays  were  tested 
with  100-  and  1000-point  adaptive  filters  in  three  environments  (anechoic,  moderate 
reverberation,  and  strong  reverberation). 
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40  sentences.®  However,  for  those  conditions  where  the  convergence  can  be  detected 
over  successive  sentences,  the  fastest  convergence  is  obtained  with  no  modifications, 
followed  by  one  modification,  and  finally  by  the  algorithm  with  both  modifications, 
as  expected. 

Obviously,  the  goal  of  these  systems  is  to  obtain  the  best  possible  performance; 
there  is  no  advantage  in  converging  quickly  to  poor  steady-state  performance.  Figures 
7.2  and  7.3  showed  that  after  converging  to  steady-state,  performance  is  best  with 
both  modifications.  Figures  7.17  and  7.18  show  that  even  during  the  trajisients,  use  of 
both  modifications  provides  the  best  performance  for  most  conditions.  The  exception 
is  two-  and  five-microphone  arrays  with  1000-point  filters,  when  the  input  TJR  is  -t-20 
dB.  In  this  case,  the  correlation  method  alone  initially  has  a  slight  advantage  over  both 
modifications.  However,  this  is  irrelevant  for  a  practical  system  since  it  occurs  when 
the  input  TJR  is  -1-20  dB  and  the  unprocessed  speech  is  already  highly  intelligible. 
Although  systems  using  both  modifications  typically  converge  more  slowly  to  superior 
steady-state  performance,  during  the  transient  they  performs  at  least  as  well  as  faster- 
converging  systems  without  both  modifications. 

As  stated  above,  the  effect  of  the  correlation  method  of  controlKng  adaptation  is 
to  increase  the  convergence  time  by  a  factor  equal  to  the  reciprocal  of  the  fraction 
of  cycles  for  which  the  system  actually  adapted.  These  percentages  can  be  obtained 
empirically  for  particular  test  conditions.  Figure  7.19  shows  percent  of  cycles  for 
which  the  system  adapted  when  the  correlation  method  was  used.  The  trends  are 
roughly  the  same  for  two-  and  five-microphone  arrays  and  for  aligned  and  misaligned 
arrays.  When  the  input  TJR  is  —20  dB,  the  system  typically  adapts  at  least  80%  of 
the  time,  and  when  the  input  TJR  is  0  dB,  the  system  typically  adapts  50-70%  of 
the  time.  Therefore,  when  the  input  TJR  is  0  dB  or  lower,  the  correlation  method 
increases  the  convergence  time  at  most  by  a  factor  of  2.  When  the  input  TJR  is  -1-20 

®Tlie  length  of  the  first  sentence  is  3.4  seconds,  and  the  average  sentence  length  is  2.9  seconds. 
Although  there  was  an  onset  transient  within  the  first  sentence,  it  was  sufiiciently  short  that  it  did 
not  affect  the  average  powers  used  to  compute  Gj  for  the  entire  sentence.  For  these  conditions, 
quantifying  the  transient  performance  would  require  recomputing  Gj  for  short  segments  of  the  first 
sentence.  Individual  inspection  of  the  output  waveforms  for  a  number  of  these  conditions  revealed 
that  systems  using  both  modifications  typically  converged  within  one  second. 
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dB,  the  system  adapts  on  20-30%  of  cycles. 

Next  it  is  useful  to  consider  trends  in  the  transient  behavior  of  the  system  using 
both  modifications  (solid  lines  in  Figs.  7.17  and  7.18).  When  the  input  TJR  is  0  dB, 
the  system’s  ability  to  adapt  quickly  to  new  interference  sources  is  likely  to  have  a 
large  effect  on  intelligibility.  Figure  7.17  shows  that  when  the  input  TJR  is  0  dB, 
the  algorithm  converges  by  the  end  of  the  first  sentence  for  most  conditions.  The 
exception  is  the  five-microphone  array  in  an  anechoic  environment. 

When  the  five-microphone  array  operates  in  the  anechoic  environment,  the  adap¬ 
tive  filter  has  two  modes  that  converge  at  different  rates  and  have  competing  effects 
on  Gj.  One  mode  converges  more  rapidly  and  cancels  the  jammer,  while  the  other, 
slower  mode  adapts  to  cancel  the  misaligned  target.  Since  this  second  mode  of  adap¬ 
tation  results  in  poorer  performance,  the  fact  that  it  converges  slowly  is  actually  an 
advantage.  If  temporal  fluctuations  of  source  and  array  locations  in  a  real  system 
prevented  convergence  of  that  second  mode,  it  would  actually  be  beneficial. 

The  competing  effects  of  these  two  modes  are  seen  for  the  five-microphone  axray 
with  L  =  100,  and  can  be  better  understood  by  considering  the  values  of  the  quantities 
that  comprise  Gi  (not  shown).  For  this  condition,  Gj  is  initially  20  dB  due  to  Ar(  J). 
As  time  passes,  the  value  of  Ar(  J)  rises  to  30  dB,  while  the  second  mode  converges 
to  produce  Ar(Tji)  of  approximately  —20  dB,  resulting  in  G/  near  10  dB.  For  the 
five-microphone  array  with  L  =  1000,  Gj  is  roughly  constant,  but  the  algorithm  is 
not  actually  converged.  Examination  of  AT{Tci)  and  Ar(J)  reveals  that  with  the 
longer  filter,  both  modes  converge  slowly  and  neither  mode  has  reached  steady-state 
at  the  end  of  40  sentences. 

Compared  to  Fig.  7.17,  the  resxdts  shown  in  Fig.  7.18  are  considerably  less  impor¬ 
tant  for  a  practical  hearing  aid.  As  stated  earlier,  when  the  input  TJR  is  -|-20  dB, 
the  unprocessed  signals  are  already  highly  intelligible.  In  this  case,  the  magnitude  of 
Gi  is  relatively  unimportant  as  long  as  G/  is  positive.  Figure  7.18  does  show  that 
with  both  modifications,  both  two-  and  five-microphone  arrays  with  1000-point  filters 
have  relatively  long  convergence  times.  In  both  anechoic  and  moderately  reverberant 
rooms,  the  convergence  time  is  on  the  order  of  10  sentences  (30  seconds).  Since  source 


174 


aligned  misaligned 


-  0  dB 


-20  dB 


Figure  7-19:  Percent  of  time  the  system  adapted  according  to  the  correlation  method, 
versus  direct-to-reverbercint  ratio,  for  three  input  TJRs.  Anechoic  restdts  are  shown 
at  a  direct-to- reverberant  ratio  of  -f-20  dB.  The  two-  and  five-microphone  arrays  were 
either  perfectly  aligned  or  misaligned  by  10°. 
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and  array  locations  may  not  remain  constant  for  that  long  in  a  real  system,  the  full 
benefits  may  not  be  realized  at  high  input  TJRs. 

Because  the  convergence  time  is  proportional  to  the  total  number  of  adaptive  filter 
taps,  it  is  expected  that  the  2-microphone  array  with  100-point  filters  wiU  converge 
most  quickly,  followed  by  the  5-microphone  array  with  100-point  filters,  then  the 
2-microphone  array  with  1000-point  filters,  and  finally  the  5-microphone  array  with 
1000-point  filters.  The  simulation  results  match  these  expectations,  best  illustrated 
by  the  moderate  reverberation  conditions  in  Fig.  7.18. 

The  transient  values  of  Gj  shown  in  Figs.  7.17  and  7.18  are  for  one  target  and  one 
jammer  adapting  from  the  initial  condition  with  all  of  the  weights  equal  to  zero.  A 
real  system  is  likely  to  encounter  situations  where  the  weights  have  adapted  to  one 
jammer  configuration  and  then  an  additional  jammer  appears.  If  the  new  jammer 
comes  from  a  direction  where  the  system’s  gain  is  greater  than  0  dB  (as  seen  in  Fig.  7.9 
and  the  upper  half  of  Fig.  7.10),  then  this  jammer  may  initially  be  more  disruptive 
than  it  would  be  in  the  absence  of  the  system.  Therefore,  there  is  some  advantage  to 
systems  with  a  tendency  to  maintain  the  polar  pattern  less  than  0  dB  for  all  angles 
of  incidence,  for  instance  the  five-microphone  array  with  1000-point  filters  as  shown 
in  the  lower  half  of  Fig.  7.10. 

7.4.2  Summary  of  transient  performance 

The  transient  performance  of  the  systems  investigated  is  summarized  as  follows: 

•  For  many  of  the  conditions  studied,  the  system  converged  in  less  than  3  seconds. 

•  Although  the  modifications  appear  to  improve  steady-state  performance  at  the 
expense  of  slower  convergence,  even  during  transients  the  algorithm  with  both 
modifications  performs  at  least  as  well  as  faster- converging  algorithms  without 
both  modifications. 

•  Although  some  conditions  have  extremely  long  convergence  times  (30  seconds 
or  longer)  these  correspond  to  situations  when  the  benefits  of  the  system  may 
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not  be  required  (very  high  TJR)  and  to  situations  when  the  slower  converging 
mode  is  detrimental  (target  cancellation). 

These  results  indicate  that  transient  behavior  of  the  modified  algorithm  appears  to 
be  sufficient  for  the  current  apphcation,  based  on  results  of  static  test  conditions.  Of 
course  the  true  test  will  come  from  field  trials  to  see  how  listeners  are  affected  by  time- 
var3ring  jammer  configurations.  If  such  tests  indicate  that  the  modified  LMS  algorithm 
developed  here  does  not  provide  fast  enough  convergence,  then  it  will  be  necessary 
to  investigate  other  algorithms  for  adjusting  the  adaptive  filter  weights.  A  promising 
candidate  is  the  fast  affine  projection  algorithm  (Gay,  1993)  which  provides  relatively 
rapid  convergence  at  low  computational  complexity.  This  algorithm  is  based  on  a 
generalization  of  NLMS  and  is  implemented  in  the  time  domain,  and  therefore  can  be 
modified  to  include  the  correlation  method  derived  in  Ch.  5.  Additional  investigation 
is  required  to  determine  if  this  algorithm  could  also  be  modified  to  include  the  sum 
method  derived  in  Ch.  4. 
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Chapter  8 


Discussion 


The  simulation  results  presented  in  Ch.  7  provide  answers  to  a  number  of  important 
questions  about  the  potential  of  adaptive  microphone- array  hearing  aids.  Clearly, 
these  results  indicate  that  the  modified  algorithm  is  sufficiently  promising  to  warrant 
testing  with  normal-hearing  and  hearing-impaired  listeners  in  laboratory  tests  and 
field  trials.  However,  practical  limitations,  such  as  difficulty  simulating  headshadow, 
head  movements  and  time-varying  jammers,  make  the  simulations  inadequate  for  ad¬ 
dressing  additional  important  issues.  Two  of  those  issues  include  the  use  of  directional 
microphones  and  the  number  of  microphones  needed  in  a  head-sized  array. 

The  discussion  in  this  chapter  first  considers  the  interpretation  of  the  intelligibility- 
weighted  gain,  Gj,  used  as  a  performance  metric  in  Ch.  7.  That  is  followed  by 
discussions  of  the  issues  affecting  the  number  of  microphones  needed  in  a  head-sized 
array  and  the  use  of  directional  microphones.  Finally,  the  nature  of  future  laboratory 
and  field  tests  are  considered. 


8.1  Interpretation  of  intelligibility- weighted  gain 

The  intelligibility-weighted  gain,  Gi,  is  a  useful  mezisure  for  quantifying  the  effect  of 
a  speech  transmission  system  (Greenberg  et  al.,  1993).  However,  in  judging  systems 
intended  for  the  hearing-aid  application,  it  is  necessary  to  consider  the  implications 
for  speech  intelligibility  in  interpreting  the  significance  of  improvements  predicted  by 
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Gi. 

The  intelligibility  of  the  system  output  will  depend  on  many  factors,  including 
the  difficulty  of  the  source  material  and  the  degree  of  reverberation.  One  very  impor¬ 
tant  factor  is  the  intelligibility-weighted  target-to-jammer  ratio  of  the  system  output, 
TJR/,out  defined  in  (3.7).  This  quantity  reflects  the  intelligibility-weighted  target-to- 
jammer  ratio  of  the  input  and  the  intelligibility- weighted  gain  due  to  the  system,  that 
is, 

TJR/,out  =  TJR/,in  ■+■  Gi,  (8.1) 

which  is  obtained  from  (3.8). 

For  normal-hearing  listeners,  target  speech  is  typically  intelligible  for  TJRj  =  0 
dB.  As  a  result,  when  the  input  TJR^  is  —20  dB,  Gj  must  be  roughly  -f20  dB  to 
make  the  output  intelhgible.  On  the  other  hand,  when  the  input  TJR  is  -1-20  dB, 
the  signals  will  be  highly  intelligible  without  any  processing.  In  that  case,  positive 
values  of  Gi  are  desirable  because  the  system  should  not  degrade  intelligibility,  but 
the  magnitude  of  Gi  is  relatively  unimportant  in  terms  of  the  intelligibility  of  the 
output. 

In  the  hearing-aid  apphcation,  Gj  and  TJR/, out  must  be  interpreted  with  respect  to 
hearing-impaired  listeners.  Studies  comparing  speech  reception  thresholds  of  normal¬ 
hearing  and  hearing-impaired  hsteners  have  estimated  that  the  disabffity  of  impaired 
listeners  is  equivalent  to  a  reduction  of  10-13  dB  in  TJR  when  the  jammer  consists 
of  a  single  competing  talker  (Festen  and  Plomp,  1990;  Larsby  and  Arhnger,  1994). 
Therefore,  a  system  that  produces  a  Gi  of  approximately  10  dB  for  moderate  levels 
of  input  TJR  is  expected  to  provide  significant  benefit  to  hearing-impaired  listeners. 


8.2  Number  of  microphones 

How  many  microphones  should  be  used  in  a  microphone- array  hearing  aid?  There  is 
no  simple  answer  to  this  question  because  the  number  of  microphones  in  the  array 

^Because  the  input  signals  were  approximately  whitened,  the  unweighted  input  TJR  corresponds 
to  TJRi,in  within  1  dB  for  the  signals  used  here. 
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has  a  signihcant  impact  on  many  issues.  These  issues  include  cancellation  of  direc¬ 
tional  sources,  robustness  of  the  underlying  fixed  system  in  extreme  reverberation, 
complexity  of  implementation,  and  the  cosmetic  acceptability  of  a  practical  device. 

8.2.1  Directional  jammers 

As  discussed  in  Sec.  2.3.1,  an  M-microphone  array  can  form  M  —  1  independent 
broadband  nulls.  As  a  result,  any  axray  (two  or  more  microphones)  will  be  effective 
against  one  directional  jammer.  The  simidation  results  presented  in  Ch.  7  show  that 
in  the  presence  of  a  single  jammer  source,  there  is  only  a  slight  benefit  to  having  more 
than  two  microphones.  Previous  work  has  shown  that  for  infinite  length  filters,  once 
the  number  of  microphones  exceeds  the  number  of  jammers,  little  or  no  additional 
benefit  is  obtained  by  adding  more  microphones  (Peterson,  1989).  The  current  simu¬ 
lation  results  indicate  that  for  one  directional  jammer,  this  is  true  for  finite  filters  as 
well. 

If  the  environment  contains  two  independent  jammers,  an  array  with  more  than 
two  microphones  can  steer  nulls  in  the  direction  of  both  jammers,  while  the  behavior 
of  the  two-microphone  array  depends  on  the  relative  jammer  locations  and  levels.  If 
the  jammers  have  unequal  powers,  a  two-microphone  eirray  wiU  attenuate  the  stronger 
of  the  two  jammers,  because  of  the  need  to  minimize  total  output  power,  but  that 
attenuation  is  limited  by  the  need  to  avoid  substantial  amplification  of  the  weaker 
jammer.  If  the  jammers  are  of  equal  power  and  are  not  located  such  that  the  single 
available  null  can  effectively  attenuate  both  sources,  the  two-microphone  array  wiU 
perform  comparably  to  the  underl3dng  fixed  beamformer. 

In  the  presence  of  a  large  number  of  independent,  directional  jammers,  these  the¬ 
oretical  considerations  suggest  that  increasing  the  number  of  microphones  wUl  always 
improve  performance.  However,  for  head-sized  arrays  and  realistic  levels  of  sensor 
noise,  the  incremental  improvement  is  negUgible  beyond  4-6  microphones  (Peterson, 
1989).  This  is  presumably  because  for  the  frequencies  and  dimensions  of  interest, 
additional  microphones  result  in  spatial  oversamphng.  The  current  work  considers 
five-microphone  arrays  due  to  this  limitation  and  also  to  facilitate  comparison  with 
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other  work  on  microphone  arrays  for  hearing  aids  (Soede  et  al.,  1993a, b;  Hoffman  et 
al.,  1994;  Stadler  and  Rabinowitz,  1993). 

The  simulation  results  presented  in  Ch.  7  also  demonstrate  one  disadvantage  of 
additional  microphones.  If  the  number  of  microphones  exceeds  the  number  of  inde¬ 
pendent  jammers  by  two  or  more,  then  the  system  has  more  degrees  of  freedom  than 
required  to  direct  nulls  at  all  of  the  jammers.  In  this  case,  if  the  target  is  misaligned, 
an  unused  degree  of  freedom  may  be  used  to  direct  a  null  at  the  misaligned  target. 

The  interaction  between  the  number  of  microphones  and  the  number  of  jammers 
discussed  above  could  be  demonstrated  using  multiple  jammers  in  computer  sim¬ 
ulations  similar  to  those  described  in  Ch.  7.  However,  designing  such  simidations 
to  provide  results  meaningful  to  the  hearing-aid  application  is  not  possible  due  to  a 
lack  of  information  about  commonly  encountered  acoustic  environments.  How  often 
do  listeners  encounter  more  than  one  directional  jammer?  When  multiple  jammers 
do  exist,  what  are  their  angular  distributions  and  relative  power  levels?  Future  work 
should  either  address  these  questions  before  simulating  multiple  jammer  environments 
or  proceed  directly  to  building  portable  microphone  arrays  for  evaluation  in  a  variety 
of  real  acoustic  environments  encountered  in  everyday  activities. 

8.2.2  Reverberation 

It  is  also  important  to  consider  the  effect  of  the  number  of  microphones  on  per¬ 
formance  in  the  presence  of  extreme  reverberation.  As  discussed  in  Sec.  7.3.2,  the 
performance  of  adaptive  systems  approaches  that  of  the  underlying  fixed  system  with 
increasing  reverberation.  Also,  for  the  array  dimensions  and  frequency  ranges  con¬ 
sidered  here,  the  number  of  microphones  alone  does  not  have  a  substantial  impact 
on  the  intelligibility- weighted  directivity  for  broadside  arrays  with  uniform  weights. 
Therefore,  in  extreme  reverberation,  the  adaptive  system  performance  measured  by 
Gi  win  approximate  the  fixed  system  performance  predicted  by  Dj,  which  is  not 
affected  by  the  number  of  microphones. 

A  consideration  neglected  in  the  previous  discussion  of  fixed  array  performance  is 
the  noise  sensitivity,  which  is  a  measure  of  robustness  of  fixed  systems  (Stadler  and 
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RabinowitZj  1993).  In  general,  when  arrays  with  different  numbers  of  microphones 
provide  the  same  level  of  intelligibility-weighted  directivity  (Dj),  the  array  with  more 
microphones  wiU  have  lower  noise  sensitivity.  For  broadside  arrays  of  omnidirectional 
microphones  with  uniform  weights^  the  noise  sensitivity  is  for  all  frequencies,  that 
is,  —3  dB  for  the  two-microphone  array  and  —  7  dB  for  the  five-microphone  array, 
relative  to  0  dB  for  a  single  omnidirectional  element.  This  indicates  that  the  fixed 
five-microphone  array  is  more  robust  to  any  noise  that  is  uncorrelated  between  micro¬ 
phones,  including  internal  sensor  noise,  gain  mismatch,  and  microphone  placement 
error.  These  sources  of  error  were  not  considered  in  Ch.  7,  although  target  misalign¬ 
ment  was  included  in  the  simulations. 

Another  issue  to  consider  is  the  robustness  of  the  underljdng  fixed  array  in  the 
presence  of  headshadow.  The  design  and  analysis  of  fixed  arrays  is  based  on  the 
assumption  that  the  array  is  in  free  space,  while  the  apphcation  requires  that  the  array 
be  worn  on  the  head.  Soede  et  al.  (1993a)  showed  that  although  polar  patterns  are 
substantially  different  for  array  responses  measured  in  free  space  and  on  a  mannikin, 
the  frequency-dependent  directivity  indices  are  comparable.  In  particular,  for  a  10- 
cm  endfire  array  of  five  cardioid  microphones  along  the  temple  of  eyeglass  frames, 
the  directivity  index  was  about  1  dB  lower  when  the  array  was  on  the  head,  relative 
to  free  space.  For  a  14-cm  broadside  array  along  the  front  of  eyeglass  frames,  the 
directivity  index  was  similar  for  measurements  made  in  free  space  and  on  the  head. 
For  the  broadside  array,  some  frequencies  showed  improved  directivity  because  the 
head  provided  additional  attenuation  of  sources  arriving  from  the  rear.  Future  work 
should  consider  comparable  two-microphone  arrays  to  determine  if  the  number  of 
microphones  affects  the  robustness  of  fixed  axrays  to  headshadow  effects.® 

^This  corresponds  to  the  underlying  fixed  array  for  the  systems  considered  in  Ch.  7. 

^It  has  been  shown  that  two-microphone  adaptive  arrays  are  robust  to  the  effects  of  headshadow 
(Greenberg  and  Zurek,  1992). 
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8.2.3  Summary  of  advantages  of  two-  and  five-microphone 
arrays 

Based  on  the  results  of  computer  simulations,  it  is  not  possible  to  recommend  the 
number  of  sensors  required  in  a  microphone-array  hearing  aid.  However,  the  relevant 
issues  are  the  following. 

The  five-microphone  array  is 

•  more  effective  against  multiple  jammers,  and 

•  more  robust  in  extreme  reverberation  (due  to  lower  noise  sensitivity  of  the 
underlying  fixed  beamformer). 

The  two-microphone  array  is 

•  less  susceptible  to  target  cancellation  for  the  same  number  of  jammers, 

•  simpler  to  implement  (in  both  hardware  and  software),  and 

•  more  cosmetically  acceptable  (two  ear-level  devices  rather  than  five  elements  on 
eyeglass  frames  or  a  headband). 

Some  of  the  advantages  (and  disadvantages)  of  two-  and  five-microphone  arrays 
could  be  combined  in  a  system  that  uses  the  average  of  five  microphone  signals  for 
the  primary  channel,  and  the  difference  between  two  of  those  signals  as  the  input  to 
a  single  reference  channel.  Such  a  system  would  combine  the  robustness  of  a  five- 
microphone  fixed  system  with  the  resistance  of  the  two- microphone  system  to  target 
cancellation.  It  would  also  require  less  computation  than  the  five-microphone  system 
with  four  reference  channels.  However,  it  would  still  require  the  user  to  wear  an  array 
of  five  microphones  and  would  not  steer  independent  nulls  against  multiple  directional 
jammers.  Even  so,  such  a  system  may  be  appropriate  if  future  studies  indicate  that 
the  increased  robustness  of  a  five-microphone  array  is  required  and  that  multiple 
directional  jammers  are  rarely  encountered  in  everyday  listening  environments. 
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8.3  Directional  microphones 


The  motivation  for  using  directional  microphones  in  microphone-array  hearing  aids 
is  the  desire  to  improve  jammer  cancellation,  particularly  in  extreme  reverberation. 
Since  the  performance  of  an  adaptive  system  approaches  that  of  the  underlying  fixed 
system  in  extreme  reverberation,  an  obvious  approach  is  to  maximize  the  directivity 
of  the  underl3ring  fixed  system.  The  design  of  fixed  arrays  for  maximal  directivity 
has  been  studied  extensively  in  the  context  of  the  hearing-aid  apphcation  (Soede  et 
al.,  1993a,b;  Stadler  and  Rabinowitz,  1993;  Kates,  1993).  One  way  to  improve  the 
directivity  of  fixed  arrays  is  the  use  of  directional  microphones.  Other  researchers 
have  also  considered  the  use  of  directional  microphones  in  adaptive  arrays  (Weiss, 
1987;  Schwander  and  Levitt,  1987;  McKinney  and  DeBrunner,  1994). 

The  theoretical  polar  patterns  of  cardioid,  supercardioid,  and  hypercardioid  mi¬ 
crophones,  which  are  independent  of  frequency,  are  shown  in  Fig.  8.1.  The  directional 
microphones  provide  a  gain  of  0  dB  to  straight-ahead  target  sources  and  attenuate 
jammers  arriving  from  all  other  directions.  For  directional  jammers,  the  amount  of 
attenuation  depends  on  the  angle  of  arrival.  Figure  8.1  shows  that  a  jammer  arriving 
from  45°  is  attenuated  roughly  2  dB  by  all  three  directional  microphones,  while  a 
jammer  arriving  from  90°  is  attenuated  6  dB,  9  dB,  and  12  dB  by  cardioid,  super- 
car  dioid,  and  hypercardioid  microphones,  respectively.  For  jammers  approaching  a 
diffuse  field,  such  as  in  extreme  reverberation,  the  amount  of  attenuation  provided 
by  a  single  microphone  is  given  by  its  directivity:  4.7  dB,  5.6  dB,  and  6.0  dB  for  car¬ 
dioid,  supercardioid,  and  hypercardioid  microphones,  respectively,  where  all  values 
are  relative  to  0  dB  for  a  single  omnidirectional  microphone. 

Stadler  and  Rabinowitz  (1993)  studied  fixed  systems  based  on  omnidirectional 
and  directional  microphones  in  14-cm  broadside  arrays  and  11-cm  endfire  arrays. 
They  considered  arrays  of  two  to  seven  elements  and  various  methods  of  selecting  the 
fixed  weights.  Their  results  for  broadside  arrays  show  that  regardless  of  the  type  of 
microphone,  the  more  complex  weighting  schemes  provide  only  slight  improvements 
in  directivity  over  uniform  weights,  and  at  a  cost  of  increased  noise  sensitivity.  That 
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Figure  8-1:  Polar  responses  of  single  directional  microphones;  cardioid  (top),  super¬ 
cardioid  (middle),  and  hypercardioid  (bottom). 
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microphone  type 

num 

1 

aer  of  micro; 
2 

ahones 

5 

Di 

Di  ^i 

Di 

omnidirectional 

cardioid 

supercardioid 

0.0  0.0 
4.7  7.9 
5.6  5.3 

2.5  -3.0 
7.1  4.9 
7.8  2.3 

Table  8.1:  Valiies  of  intelligibility- weighted  directivity,  Di,  and  intelligibility- weighted 
noise  sensitivity,  $ j,  both  in  dB,  for  a  single  microphone  or  a  14-cm  broadside  array 
with  uniform  weights.  AU  values  taken  from  Stadler  and  Rabinovdtz  (1993),  except 
those  for  a  two-microphone  array  of  omnidirectional  elements,  which  were  computed 
using  equivalent  methods. 

result  indicates  that  obtaining  the  primary  channel  from  the  mean  of  the  microphone 
signals  (uniform  weights)  is  sufficient  in  the  current  application. 

Stadler  and  Rabinowitz  (1993)  calculated  intelligibility-weighted  directivities,  Dj, 
and  noise  sensitivities,  $/,  for  14-cm  broadside  arrays  of  two  and  five  elements  using 
uniform  weights  with  omnidirectional  and  directional  microphones.^  Table  8.1  sum¬ 
marizes  values  relevant  to  the  current  discussion.  These  results  show  that  for  fixed 
arrays  of  constant  length,  additional  microphones  have  little  effect  on  directivity,  but 
do  improve  the  noise  sensitivity,  as  discussed  in  Secs.  7.3.2  and  8.2.2.  Clearly,  using 
either  cardioid  or  supercardioid  microphones  instead  of  omnidirectional  ones  provides 
substantial  improvements  in  directivity,  with  tolerable  levels  of  noise  sensitivity. 

It  is  also  necessary  to  consider  the  effect  of  directional  microphones  on  the  per¬ 
formance  of  the  adaptive  system  against  directional  jammers.  Obviously,  the  use  of 
directional  elements  will  have  a  large  and  beneficial  effect  against  jammers  arriving 
from  behind,  because  they  eliminate  the  front-back  symmetry  of  broadside  arrays.  As 
discussed  above,  the  directional  elements  have  no  effect  on  the  taxget  signal  and  at¬ 
tenuate  directional  jammers  by  an  amount  that  depends  on  the  angle  of  arrival.  That 
attenuation  is  effectively  a  shift  in  the  TJR  of  the  input  signals  seen  by  the  adaptive 
system.  For  jammers  arriving  from  forward  directions,  directional  elements  vrill  shift 

^In  that  work,  the  microphones  referred  to  as  hypercardioids  are  actually  supercardioids. 
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the  input  TJR  by  less  than  6-12  dB,  depending  on  the  type  of  microphones  used. 
However,  since  the  modified  algorithm  based  on  the  methods  presented  in  Chs.  4  and 
5  is  robust  to  changes  in  input  TJR,  the  use  of  directional  microphones  should  not 
have  a  large  effect  on  the  performance  of  the  adaptive  system. 

The  computer  simulations  of  Ch.  7  could  be  modified  to  incorporate  directional 
microphones,  but  the  results  of  those  simulations  can  already  be  predicted.  Because 
the  jammer  was  located  at  either  45°  or  55°,  including  directional  microphones  is 
equivalent  to  shifting  the  input  TJR  by  2  dB  for  the  anechoic  condition,  and  the  results 
would  not  differ  substantially  from  those  obtained  with  omnidirectional  microphones. 
For  the  reverberant  conditions,  the  performance  would  improve  to  the  extent  that 
the  directional  microphones  increase  attenuation  of  reflections  from  all  directions.  In 
strong  reverberation,  the  performance  of  the  underlying  fixed  system  would  increase 
to  roughly  7—8  dB,  from  the  1-3  dB  shown  by  the  rightmost  ’x’  in  each  plot  in  Fig.  7.4. 

From  the  above  discussion,  it  is  clear  that  using  directional  elements  in  microphone- 
array  hearing  aids  should  have  a  beneficial  effect  on  the  performance  in  reverberation, 
with  no  cost  in  performance  or  processing.  Therefore,  arrays  constructed  for  future 
evaluations  should  incorporate  directional  microphones. 


8.4  Laboratory  and  field  tests 

The  preceding  sections  have  identified  a  number  of  issues  that  cannot  be  adequately 
investigated  by  computer  simulations.  Future  work  should  focus  on  construction  and 
evaluation  of  prototype  systems  with  two-  and  five-microphone  arrays  of  directional 
elements.  The  evaluations  should  be  designed  to  study  robustness  to  headshadow  and 
head  movements,  to  confirm  the  benefits  predicted  by  the  simulations,  to  assess  the 
response  in  the  presence  of  time-varying  jammer  sources,  and  to  study  the  number 
of  microphones  required,  that  is,  the  number  of  independent  jammers  commonly 
encountered  in  real  environments. 

The  first  set  of  tests  to  be  performed  with  a  real  system  should  consider  robustness 
to  headshadow  via  physical  measurements.  For  evaluating  the  underlying  fixed  sys- 
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tern,  testing  should  compare  the  polax  patterns  and  directivities  with  the  array  in  free 
space  and  on  a  mannikin,  as  measured  by  Soede  et  al.  (1993a)  for  a  five-microphone 
axray  of  cardioid  elements.  Similar  measurements  of  polar  patterns  in  free  space  and 
on  a  mannikin  head  should  be  made  for  adaptive  systems  in  the  presence  of  directional 
jammers  in  anechoic  environments  and  in  moderate  reverberation. 

In  addition,  physical  measurements  should  be  made  in  the  presence  of  head  move¬ 
ments,  to  verify  that  the  additional  benefits  of  long  adaptive  filters  can  be  obtained 
under  realistic  conditions,  as  discussed  in  Sec.  7.3.2.  Previous  work  (Schwander  and 
Levitt,  1987)  has  considered  the  effect  of  head  movements  on  word  recognition  scores 
of  normal-hearing  subjects  listening  to  speech  processed  by  an  adaptive  noise  canceller 
in  a  reverberant  room.  That  system  used  an  800-point  (80  ms)  adaptive  filter  and 
obtained  the  reference  input  from  a  cardioid  microphone  mounted  on  the  listener’s 
head  and  facing  toward  the  rear.  Their  results  showed  that  although  head  movements 
reduce  the  effectiveness  of  the  noise  reduction  process,  that  reduction  is  small  relative 
to  the  benefit  of  the  processing.  Future  work  should  include  physical  measurements 
to  assess  the  effect  of  head  movements  on  the  generalized  sidelobe  canceller,  in  order 
to  quantify  the  practical  benefits  of  long  filters  in  reverberation. 

The  next  step  is  to  perform  inteUigibility  tests  with  normal-hearing  and  hearing- 
impaired  listeners.  These  tests  should  confirm  the  improvements  predicted  by  the 
simulations  in  a  controlled  environment,  that  is,  with  known  direct-to-reverberant 
ratio,  input  TJR,  number  of  jammers,  and  jammer  locations.  In  addition  to  measur¬ 
ing  intelligibility,  these  tests  should  solicit  the  subjects’  subjective  impressions  and 
attempt  to  quantify  any  other  effects  of  the  processing,  such  as  ease  of  listening, 
annoyance,  etc.  Introducing  time- varying  jammers  under  these  controlled  conditions 
wiU  also  allow  assessment  of  the  algorithm’s  transient  behavior  and  its  effect  on  both 
intelligibility  and  ease  of  listening. 

Finally,  construction  of  a  battery-powered,  wearable  prototype  wiU  aUow  field  tests 
with  hearing-impaired  listeners.  These  tests  wiU  assess  the  potential  of  the  systems 
in  real  acoustic  environments,  rather  than  in  a  controUed  laboratory  setting.  The 
hstener  wiU  be  exposed  to  time- varying  jammers,  various  numbers  of  jammers,  differ- 
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ent  levels  of  reverberation,  and  a  range  of  input  TJRs.  Quantifying  the  performance 
of  systems  outside  of  the  laboratory  will  require  developing  a  simple  rating  method 
for  the  listener  to  use.  It  will  also  be  useful  to  develop  a  monitoring  scheme  that 
stores  samples  of  the  adaptive  filter  weights,  the  intermicrophone  correlation,  and 
other  measures  obtained  by  the  system.  This  information  can  be  used  to  determine 
how  often  the  adaptive  algorithm  is  significant^  and  to  provide  information  about  the 
types  and  frequency  of  acoustic  environments  encountered  by  the  listener  in  everyday 
activities. 


^  When  the  weights  are  close  to  zero,  the  adaptive  system  performance  is  not  substantially  different 
from  that  of  the  underlying  fixed  system. 
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Chapter  9 


Conclusion 


Tliis  work  is  part  of  a  larger  project  concerned  with  the  development  of  microphone- 
array  hearing  aids.  As  described  in  Ch.  2,  previous  work  consisted  of  computing 
theoretical  performance  Hmits  (Peterson,  1989)  and  evaluating  practical  systems  (Pe¬ 
terson  et  al.,  1990;  Greenberg  and  Zurek,  1992).  Those  evaluations  revealed  several 
problems  with  the  generalized  sidelobe  canceller  and  led  to  the  development  of  two  ad 
hoc  modifications  to  the  adaptive  algorithm.  The  current  work  uses  both  theoretical 
analysis  and  computer  simulations  to  formalize  previously  proposed  modifications, 
specifies  a  modified  algorithm  for  use  in  an  adaptive  microphone-array  hearing  aid, 
and  demonstrates  the  benefits  of  that  algorithm  in  a  variety  of  simulated  acoustic 
environments.  Concurrent  work  (Welker,  1994)  has  considered  adaptive  arrays  with 
binaural  outputs,  to  prevent  “tunnel  hearing”  imposed  by  a  directional  hearing  aid 
with  a  monaural  output.  Future  work  will  consist  of  implementing  the  proposed 
systems  in  real-time  for  evaluations  in  laboratory  and  field  trials. 

The  current  work  contains  a  thorough  analysis  of  previously  proposed  methods 
for  controlling  adaptation  and  provides  guidelines  for  parameter  selection  applicable 
in  reverberation  and  for  arbitrary  numbers  of  microphones  (Chs.  4  and  5).  It  also 
contains  an  analysis  of  the  specific  causes  of  target  cancellation  in  reverberation  and 
reveals  that  a  simple  parameter  choice  can  solve  this  problem  (Ch.  6).  The  results  of 
Chs.  4-6  lead  to  specification  of  a  modified  algorithm  for  use  in  adaptive  microphone- 
array  hearing  aids. 
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Tlie  modified  algorithm  was  implemented  and  evaluated  in  computer  simulations. 
Chapter  7  contains  the  results  of  those  simulations,  which  serve  three  purposes.  First, 
they  demonstrate  the  effectiveness  of  the  modifications  and  parameter  choices  deter¬ 
mined  in  Chs.  4—6.  Second,  they  illustrate  the  levels  of  performance  provided  by 
practical  systems  using  different  filter  lengths,  and  numbers  of  microphones,  ikf, 
in  a  variety  of  acoustic  environments.  Finally,  they  identify  issues  for  further  inves¬ 
tigation  with  a  real-time  system  in  laboratory  and  field  tests.  Chapter  8  consists  of 
a  discussion  of  issues  not  resolved  by  the  computer  simulations  and  includes  recom¬ 
mendations  for  future  work. 

The  result  of  this  work  is  the  specification  of  a  relatively  simple  and  robust  broad¬ 
side  adaptive  array  that  is  expected  to  provide  a  minimum  of  7  dB  interference 
reduction  in  a  very  reverberant  sound  field,  and  much  greater  reduction  when  the  in¬ 
terference  arrives  predominantly  via  the  direct  path.  In  particular,  this  work  supports 
the  following  conclusions: 

•  The  modified  adaptive  algorithm  makes  the  generalized  sidelobe  canceller  robust 
to  the  problems  of  imsalignment  and  misadjustment,  which  occur  predominantly 
at  high  TJR  (Sec.  7.3.1).  The  modifications  consist  of  the  sum  method  of 
normalizing  the  step-size  parameter  in  the  LMS  algorithm  (Ch.  4)  and  the 
correlation  method  of  controlling  adaptation  (Ch.  5). 

•  Using  a  relatively  short  primary  delay  prevents  cancellation  of  the  direct  target 
due  to  target  reflections  (Ch.  6). 

•  Very  large  intelligibility-weighted  gains  can  be  achieved  in  relatively  anechoic 
environments;  the  size  of  the  gains  decreases  with  increasing  reverberation. 
Substantial  benefits  are  often  provided  in  moderate  reverberation,  particularly  if 
relatively  long  filters  (~  100  ms)  are  used.  In  extreme  reverberation,  the  perfor¬ 
mance  approaches  that  obtained  using  the  underlying  fixed  system  (Sec.  7.3.2), 
This  asymptotic  performance  can  be  improved  by  using  arrays  of  directional 
microphones  (Sec.  8.3). 
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•  In  many  realistic  environments,  convergence  of  the  modified  algorithm  is  suffi¬ 
ciently  rapid  for  processing  speech  signals  (Sec.  7.4). 

•  In  the  presence  of  a  single  jammer  source,  there  is  no  advantage  to  using  more 
than  two  microphones  (Sec.  7.3.2).  Additional  investigations  are  reqtdred  to 
determine  whether  more  than  two  microphones  are  beneficial  when  operating 
in  commonly  encountered  acoustic  environments  (Sec.  8.2). 

•  Real-time  processors  must  be  constructed  to  confirm  the  benefits  predicted  by 
the  simulations  and  to  permit  evaluation  by  human  listeners  under  realistic 
acoustic  conditions  that  include  headshadow,  head  movements,  and  time  vary¬ 
ing  jammers  (Sec.  8.4). 
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Appendix  A 


The  problem  addressed  in  this  appendix  is  the  convolution  of  the  approximate  target 
pdf  given  by  (5.16)  with  the  jammer  pdf  given  by  (5.8),  according  to  (5.15).  Those 
equations  are  reproduced  here  using  the  unit  step  function,  u{t),  to  indicate  the  ranges 
over  which  they  are  nonzero: 


(A.1) 


Up) 


_  (u{p  —  cos{kd))  —  u{p  —  cos{kd  sin  ^o))) 


(|  —  5o)\/(fcd)2  —  (arccosp)2y'l  — 

fpMp\y)  =  L  (^p)  *  MY  +  i)p)- 


(A.2) 

(A.3) 


The  two  expressions  to  be  convolved  are  determined  by  making  the  appropriate  sub¬ 
stitutions  in  (A.l)  and  (A.2): 


f  +  ^  \  (^+ l)(u(p- |^cos(fedsin0o))-'ti(p- 5^))  1  /  Y  \ 

•'"’‘V  Y  V  2r(l-cos(A!dsin0o))  ”^2  1^  Y+l) 

(A.4) 

and 


+ m  = 


{Y  +  l)(u(p  —  cos(fcd))  —  u{p  —  cos(fcdsin  0o))) 
(I  -  «c),/{kdY  -  (aiccos((y  +  1V))V1  -  ((y  +  i)pr  ■ 


(A.5) 


There  are  two  cases  that  result  from  this  convolution,  with  each  case  having  three 
distinct  regions.  Case  A  occurs  when  the  jammer  pdf  is  wider  than  the  taxget  pdf. 
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that  is,  when 


cos(kdsm6o)  —  cos{kd) 

^  1  —  cos(fedsin  5o)  ° 


(A.6) 


Case  B  occurs  when  the  target  pdf  is  wider  than  the  jammer  pdf,  that  is,  when 

cos(A:<isin  0o)  —  cos(fc(i) 

^  1  —  cos(fedsin  5o)  ° 

For  Case  A,  the  three  regions  of  interest  are  bounded  by 


Pi 

P2a 

P3a 

pi 


Y  cos[kd  sin  ^o)  +  cos(fcd) 

Y  +  1 

Y  +  cos{kd) 


Y  +  1 
Y  cos 


+  1 

sffcdsin^o)  +  cos(A:dsinflo)  • 

- y  ^  ^ - =  cos(«asin0o) 

Y  +  cos(fcdsin^o) 


(A.7) 


Y  +  1 


(A.8) 

(A.9) 

(A.IO) 

(A.ll) 


The  conditional  pdf  is  determined  by  substituting  (A.4)  and  (A. 5)  into  (A. 3),  giving 


fpMp\^)  =  j  fpiiO^  +  l>o)/p.  {p-^-^Po^  dpo 

Y  +  1 

2(1  -  ^o) 

f  /p-”y^cos(A;d[sm0o) 

\  Im)  P°)  ^P°  -  Pi) -u{p-  P2a)] 

I  •'  Y+1 

/p—  cos(kd  sin  0o ) 

y  X(p,  po)  dpo  [u{p  -  P2a)  -  U{p  -  P3a)] 

-'“r+T 

coB(ik<lsin  ^o) 

/y+i 

y  X{p,  po)  dpo  [u{p  -  P3a)  -  u{p  -  P4)] 

-~r+T 


I 


(A.12) 


for  Y  <[  wKerc 


X{p,po)  = 


\/M^  arccos  ((y  +  i)p„))Vi -((!"  + !>")’ 

!  y  +  1  ..(  y 


^F(l  —  cos(A:dsin  ^o)) 


Y  +  1. 


(A.13) 
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Integrating  (A.  13)  produces 


.  -arcsin(5^°^£+^)^)) 

X(p,pc)dpo  =  --  '•  > 

J  ^(1  ^  cos(Ajdsin  ^oj) 

+  , -  ^  ,  +C  (A.14) 

yi^^Y  -  (arccos((r  +  l)p  -  Y)fyjl  -  ((y  +  l)p  -  Yf 

where  C  is  any  constant.  Substituting  (A.14)  into  (A. 12)  produces  the  result  for  Case 
A: 


fMp\Y)  = 


r  +  1 


21^(1  —  ^o)(l  —  cos(fcdsin0o)) 


f  [tt 

<  —  arcsin 

IL2 


+ 


kd  {u[p  -  Pi)  -  u{p  -  P2a)] 

1^(1  —  cos(A!dsin  ^o)) 


^arccos((y  +  l)p  —  Y  cos(fcdsin 


ykd?  -  (axccos(p(y  +  1)  -  y))Yl  -  {p{Y  +  1)  -  Yf 

_ ( arccos((y  +  l)p  —  Y)\  _ f  arccos((y  +  l)p  —  Y cos{kdsm  0o)) 

-|“  arcsin  |  |  —  arcsin  I - z — z - 


I 


kd 


V 


kd 


[u{p  -  p2a)  -  U{p  -  p3a)] 

^(1  —  cos(A:dsin0o)) 


+ 


\JkdP  -  (arccos(p(y  +  1)  -  Y)f  yjl  -  (p(y  +  1)  -  y)* 


( arccos((y  +  l)p  -  y)^ 

arcsin  I - — - 

y  kd 


) 


^  -  00  Hp  -  p3a)  -  u{p  -  P4)]|  .  (A.15) 


Similarly,  for  Case  B,  the  three  regions  of  interest  axe  bounded  by 


Pi  = 
P2b  — 
PSb  = 

P4  = 


y  cos(A;dsin  0o)  +  cos{kd) 

y  +  1 

cos(A!dsin  0o)  =  psa 
Y  +  cos{kd) 

y  +  1 

y  +  cos(kd  sin  0o) 

y  +  1  ■ 


(A.  16) 
(A.17) 
(A.18) 

(A.19) 


The  conditional  pdf  is  again  determined  by  substituting  (A. 4)  and  (A.5)  into  (A. 3), 
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this  time  for  Y"  >  Yo,  giving 


fpMp\y) 


/  fpjiiy + ^)po)fpt  (p  -  ^~^p°) 
Y  +  l 
2(1 -^o) 

f  /•P-i^cos(fe<ismfio)  V  ,  r  / 


X{p,  po)  dpo  [u{p  -  pi) -u{p-  P2b)] 


+  L(W)  ^(P’  P°)  ^P°  “  ^2b)  -  «(P  -  Psb)] 

y+l 


X{p,  po)  dpo  [u{p  -  pz)  -  u{p  -  P04)]  >  •  (A.20) 


which  is  identical  to  (A.  12)  except  for  the  limits  on  the  second  integral.  Substituting 
(A.  14)  into  (A.20)  produces  the  result  for  Case  B; 


f.Mf>\Y)  = 


i'  +  l 


2Y'(|  —  ^o)(l  —  cos(fedsin  0o)) 
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